Description
The goal
I'd like to improve Bdv-Playground and make it capable of handling really big data with low RAM.
Why it's not ideal currently
It's doing ok, but it sometimes fails because some cached sources have a SoftRef
policy which is not clearing the memory early enough. When the memory is nearly full, the JVM then spends most of its time in GC calls, and this makes the GUI not responsive and me pretty annoyed. Soft references do not seem to be cleared fast enough in this case, and also, part of the problem comes from the absence of order guarantee when removing cached data : the 'cell' which has been accessed the latest is not necessarily the first removed (see extract from https://www.baeldung.com/java-soft-references):
Though, no guarantees are placed upon the time when a soft reference gets cleared or the order in which a set of such
references to different objects get cleared.
As a rule, JVM implementations choose between cleaning of either recently-created or recently-used references.
Where soft references are generated in bdv playground ?
I think the main issue in bdv playground comes from the class RAIHelper which has a softref policy. RAIHelper is used by default in bdv-playground for all cached sources which are not coming from a SpimDataset (Resampled sources, Fused sources).
How to solve the issue ?
1. Quick fix
I could probably just put a BoundSoftRef
policy with a fixed amount of cells in RAIHelper, but one of the design problem I have is that I make a cache for each 3D RAI (one cache per resolution level and per timepoint). So if I have 50 timepoints and 5 resolution levels, I create 250 different caches. If I put a bound of 10 cells for each cache, I have in fact a max bound of 2500 cells.
2. Better fix
2.a. A cache for each source
It would be better to have at least a single cache for each Source. Like in the AbstractSpimSource implementation, a key of the cache should contain all the information to access a cached cell (timepoint, level, gridlocation). But I'm a bit lost in all the cache API (cache loader, cache, access flags...)
2.b. A global cache for all SourceAndConverter registered in bdv-playground
I was wondering whether it was possible to implement a single cache that would contain all the cached cells from all bdv-playground sources. Basically a key of this cache would identify:
- the sourceandconverter object
- timepoint
- resolution level
- grid location
This would allow to invalidate all caches at once if necessary and programmatically. Also, a bounded soft ref policy could be applied all at once and act upon all sources.
There is a potential issue memory leak issue, but I think it should be ok: if a SourceAndConverter object is kept in memory because of its remaining cached cells, the order guarantee on BoundedSoftRef should sooner or later manage to get rid of the remaining cells. Also, when a Source is removed from Bdv-Playground, we could make sure to invalidate its cached cells from the global cache.
So : do you think 2.b. is a good option ? If yes, I'll probably need a bit of help to start it, because as I wrote, I'm a bit lost in the API.
Optionally:
- I'd like to know the size retained by the cache of a SourceAndConverter (probably very difficult)
ping @tischi @tpietzsch