simplezarr.utils.chunkpool
Support for managing a pool of chunks, multi-user, with caching and lifetime handlers.
The ChunkPool object keeps track of the individual chunks, making it easier to manage them, by getting and dropping chunks. The pool supports multiple ‘users’, only destroying chunks when no user use it anymore. Caching is also supported, enabling the pool to retain unused chunks, making acquiring these chunks later much faster. The pool also supports callbacks for a chunks lifetime events (load, drop, destroy).
The ChunkManager is a thin class that leverages the multi-user and callbacks. One can subclass it to easily manage chunks throughout their lifetime. So you can get a clear separation between code that determines what chunks to load and drop, while other code determines what actually happens to the chunks.
- class simplezarr.utils.chunkpool.ChunkPool(multiscale_info: MultiscaleInfo, cache_size: int = 0)
Bases:
objectAn object to get access to individual chunks, with support for caching and parallel loading.
- Parameters:
multiscale_info – MultiscaleInfo The multiscale_info object for which to create a pool.
Cache behavior
Chunks that are dropped and have zero references (i.e. users), are normally destroyed. If
cache_size > 0, then that number of chunks are kept in memory, so that getting the chunk later is super-fast. Cached chunks are dropped oldest first (i.e. FIFO). The use of caching delays the ChunkLocation objects from being destroyed.- classmethod from_zarr_node(zarr_node: ZarrNode, cache_size: int = 0) list[ChunkPool]
Create a
ChuckPoolfor every (multiscale) image in the given Zarr node.
- property multiscale_info: MultiscaleInfo
The
MultiscaleInfoobject that represents the information on the multiscale image.
- property memory_usage: int
The current memory usage in bytes.
The number is the sum of the (uncompressed) sizes of the arrays representing the chunks.
- enable_async_load_handlers(call_soon_threadsafe: Callable | Literal['asyncio', 'none'])
Set the pool up to asynchronously call the load-handlers.
In normal operation, the load-handlers only get invoked when the ChunkLocation objects are waited upon, either by
chunk_location.wait()orchunk_pool.wait_for_chunks_to_load(). With async enabled, the load-handlers are fired as soon as the data is loaded. This behaviour is especially intended for interactive applications such as data viewers.The async behaviour does not depend on asyncio, but can be used with any framework that can provide a
call_soon_threadsafe()function. That said, asyncio is the most common use-case, so one can simply doenable_async_load_handlers("asyncio"). Useenable_async_load_handlers("none")to turn off again.
- destroy()
Drop and destroy all chunks.
- get_chunk(level: int, index: tuple[int, ...], ref: str = 'pool', *, load_handler=None, drop_handler=None, destroy_handler=None) ChunkLocation
Get a ChunkLocation object.
- Parameters:
level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.
ref – str The reference to identify the code that requests the chunk. Default ‘pool’. This is used by the pool to ref-count the chunk usage and destroy the chunk when there are no more refs left. Also see the
ChunkManager.handlers – callable Functions that will be called at specific lifetime events of the chunk.
- Returns:
- ChunkLocation
A representation of the requested chunk. The corresponding data is being loaded but may not be ready yet. If async loading is enabled, the load handler will be called as soon as the data arrives.
- Return type:
chunk_location
Individual chunks can be loaded synchronously using:
chunk_location.wait() data = chunk_location.data
After getting multiple chunks, it’s easy to load them in parallel:
loop.wait_for_chunks_to_load()
This is equivalent to:
chunk_locations = [...] for chunk_location in chunk_locations: chunk_location.wait()
- drop_chunk(level: int, index: tuple[int, ...], ref: str = 'pool') None
Release a chunk by their index.
- Parameters:
level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.
ref – str The reference to identify the code that requested the chunk. Default ‘pool’. This must be the same value as when
get_chunk()was called.
This drops the chunk, invoking any drop handlers. When the chunk has no more refs, the chunk is destroyed. When the code has a single user,
- iter_chunks() Generator[ChunkLocation, None, None]
Iterate over all currently loaded chunks, both loaded and cached.
- wait_for_chunks_to_load()
Wait for all requested chunks to load their data.
- class simplezarr.utils.chunkpool.ChunkManager(pool)
Bases:
objectA simple wrapper for a
ChunkPoolthat represents one specific ‘user’ of the pool.To use this class, subclass it and implement the
on_load,on_dropandon_destroymethods.- get_chunk(level: int, index: tuple[int, ...])
Get a ChunkLocation object.
- Parameters:
level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.
- Returns:
- ChunkLocation
A representation of the requested chunk. The corresponding data is being loaded but may not be ready yet. If async loading is enabled, the
on_loadmethod will be called as soon as the data arrives.
- Return type:
chunk_location
The managers
on_load,on_drop, andon_destroy, are automatically registered as handlers.
- drop_chunk(level: int, index: tuple[int, ...]) None
Release a chunk by their index.
- Parameters:
level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.
This drops the chunk, invoking any drop handlers. When the chunk has no more refs, the chunk is destroyed.
- on_load(chunk_location: ChunkLocation)
Method that gets called when a chunk is loaded. Override this in your subclass.
- on_drop(chunk_location: ChunkLocation)
Method that gets called when a chunk is dropped. Override this in your subclass.
- on_destroy(chunk_location: ChunkLocation)
Method that gets called when a chunk is destroyed. Override this in your subclass.
- class simplezarr.utils.chunkpool.ChunkLocation(scale_info: ScaleInfo, index: tuple[int, ...])
Bases:
objectAn object that represents a chunk location.
- property future
The
concurrent.futures.Futurefor loading the data.
- property data: ndarray
The data (numpy array) for this chunk.
When this property is accessed before the data is loaded, a RuntimeError is raised.
- wait()
Synchronously wait for the chunk’s data to load.