simplezarr.utils.chunkpool

Support for managing a pool of chunks, multi-user, with caching and lifetime handlers.

The ChunkPool object keeps track of the individual chunks, making it easier to manage them, by getting and dropping chunks. The pool supports multiple ‘users’, only destroying chunks when no user use it anymore. Caching is also supported, enabling the pool to retain unused chunks, making acquiring these chunks later much faster. The pool also supports callbacks for a chunks lifetime events (load, drop, destroy).

The ChunkManager is a thin class that leverages the multi-user and callbacks. One can subclass it to easily manage chunks throughout their lifetime. So you can get a clear separation between code that determines what chunks to load and drop, while other code determines what actually happens to the chunks.

class simplezarr.utils.chunkpool.ChunkPool(multiscale_info: MultiscaleInfo, cache_size: int = 0)

Bases: object

An object to get access to individual chunks, with support for caching and parallel loading.

Parameters:: multiscale_info – MultiscaleInfo The multiscale_info object for which to create a pool.

Cache behavior

Chunks that are dropped and have zero references (i.e. users), are normally destroyed. If cache_size > 0, then that number of chunks are kept in memory, so that getting the chunk later is super-fast. Cached chunks are dropped oldest first (i.e. FIFO). The use of caching delays the ChunkLocation objects from being destroyed.

classmethod from_zarr_node(zarr_node: ZarrNode, cache_size: int = 0) → list[ChunkPool]: Create a ChuckPool for every (multiscale) image in the given Zarr node.

property multiscale_info: MultiscaleInfo: The MultiscaleInfo object that represents the information on the multiscale image.

property memory_usage: int

The current memory usage in bytes.

The number is the sum of the (uncompressed) sizes of the arrays representing the chunks.

enable_async_load_handlers(call_soon_threadsafe: Callable | Literal['asyncio', 'none'])

Set the pool up to asynchronously call the load-handlers.

In normal operation, the load-handlers only get invoked when the ChunkLocation objects are waited upon, either by chunk_location.wait() or chunk_pool.wait_for_chunks_to_load(). With async enabled, the load-handlers are fired as soon as the data is loaded. This behaviour is especially intended for interactive applications such as data viewers.

The async behaviour does not depend on asyncio, but can be used with any framework that can provide a call_soon_threadsafe() function. That said, asyncio is the most common use-case, so one can simply do enable_async_load_handlers("asyncio"). Use enable_async_load_handlers("none") to turn off again.

destroy(): Drop and destroy all chunks.

get_chunk(level: int, index: tuple[int, ...], ref: str = 'pool', *, load_handler=None, drop_handler=None, destroy_handler=None) → ChunkLocation

Get a ChunkLocation object.

Parameters:

level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.
ref – str The reference to identify the code that requests the chunk. Default ‘pool’. This is used by the pool to ref-count the chunk usage and destroy the chunk when there are no more refs left. Also see the ChunkManager.
handlers – callable Functions that will be called at specific lifetime events of the chunk.

Returns:

ChunkLocation: A representation of the requested chunk. The corresponding data is being loaded but may not be ready yet. If async loading is enabled, the load handler will be called as soon as the data arrives.

Return type:

chunk_location

Individual chunks can be loaded synchronously using:

chunk_location.wait()
data = chunk_location.data

After getting multiple chunks, it’s easy to load them in parallel:

loop.wait_for_chunks_to_load()

This is equivalent to:

chunk_locations = [...]
for chunk_location in chunk_locations:
    chunk_location.wait()

drop_chunk(level: int, index: tuple[int, ...], ref: str = 'pool') → None

Release a chunk by their index.

Parameters:

level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.
ref – str The reference to identify the code that requested the chunk. Default ‘pool’. This must be the same value as when get_chunk() was called.

This drops the chunk, invoking any drop handlers. When the chunk has no more refs, the chunk is destroyed. When the code has a single user,

iter_chunks() → Generator[ChunkLocation, None, None]: Iterate over all currently loaded chunks, both loaded and cached.

wait_for_chunks_to_load(): Wait for all requested chunks to load their data.

class simplezarr.utils.chunkpool.ChunkManager(pool)

Bases: object

A simple wrapper for a ChunkPool that represents one specific ‘user’ of the pool.

To use this class, subclass it and implement the on_load, on_drop and on_destroy methods.

get_chunk(level: int, index: tuple[int, ...])

Get a ChunkLocation object.

Parameters:

level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.

Returns:

ChunkLocation: A representation of the requested chunk. The corresponding data is being loaded but may not be ready yet. If async loading is enabled, the on_load method will be called as soon as the data arrives.

Return type:

chunk_location

The managers on_load, on_drop, and on_destroy, are automatically registered as handlers.

drop_chunk(level: int, index: tuple[int, ...]) → None

Release a chunk by their index.

Parameters:

level – int The scale level for the requested chunk.
index – tuple[int, …] The index for the requested chunk.

This drops the chunk, invoking any drop handlers. When the chunk has no more refs, the chunk is destroyed.

on_load(chunk_location: ChunkLocation): Method that gets called when a chunk is loaded. Override this in your subclass.

on_drop(chunk_location: ChunkLocation): Method that gets called when a chunk is dropped. Override this in your subclass.

on_destroy(chunk_location: ChunkLocation): Method that gets called when a chunk is destroyed. Override this in your subclass.

class simplezarr.utils.chunkpool.ChunkLocation(scale_info: ScaleInfo, index: tuple[int, ...])

Bases: object

An object that represents a chunk location.

property scale_info: ScaleInfo: The info for the scale that this chunk is part of.

property level: int: The integer level that this chunk belongs to.

property index: tuple[int, ...]: The index of this chunk.

property nbytes: int: The size of the chunk in (uncompressed) bytes

property future: The concurrent.futures.Future for loading the data.

property data: ndarray

The data (numpy array) for this chunk.

When this property is accessed before the data is loaded, a RuntimeError is raised.

property refs: set[str]: A set of references that currently use this chunk.

wait(): Synchronously wait for the chunk’s data to load.