API
The lightweigth core of simplezarr.
Stores
Zarr data is stored as a set of key-value pairs. This can be a directory with files on your hard-drive. Or a Python dict in memory, or a remote resource accessible over the internet.
Stores give access to that data in a consistent way, so that the code that reads/writes the Zarr data does not have to care how/where the data is stored. Multiple implementations are provided. But also wrapper stores for various purposes.
- class simplezarr.stores.ReadableStore
Bases:
BaseStoreA store that can read keys.
Partial getting is implemented in this base class by using
.get(), and then taking a slice.
- class simplezarr.stores.WritableStore
Bases:
BaseStoreA store that can write and delete keys.
Partial setting is implemented in this base class by using
.get(), updating the value, and then.set(). Similarly,erase_values()anderase_prefix()are implemented in the base class; they can be overridden if a subclass can implement it more efficiently.
- class simplezarr.stores.ListableStore
Bases:
BaseStoreA store that can list keys.
Although
list_prefix()andlist_dir()are implemented in this base class, subclasses can likely implement them more efficiently.- list_prefix(prefix: str) list[str]
Retrieve all keys with a given prefix.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.
For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then
list_prefix("a/")would return “a/b” and “a/c/d”.
- list_dir(prefix: str) list[str]
Retrieve all keys within a given directory.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then
list_dir("a/")would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”.list_dir("b/")would return the empty set.
- class simplezarr.stores.MemoryStore(fields: dict | None = None)
Bases:
ReadableStore,WritableStore,ListableStoreImplementation of a readable, writable and listable store, based on an in-memory dict.
- class simplezarr.stores.LocalStore(path: str | Path)
Bases:
ReadableStore,WritableStore,ListableStoreImplementation of a readable, writable and listable store, based on the local file system.
The given path represents the root of the store.
- get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]
Retrieve possibly partial values from given key_ranges.
The
key_rangesis an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.
- set_partial_values(key_start_values: list[tuple[str, int, bytes]])
Store values at a given key, starting at byte range_start.
- erase_prefix(prefix: str)
Erase all keys with the given prefix from the store.
The prefix represents a ‘directory’; it must end with a ‘/’.
- list_prefix(prefix: str) list[str]
Retrieve all keys with a given prefix.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.
For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then
list_prefix("a/")would return “a/b” and “a/c/d”.
- list_dir(prefix: str) list[str]
Retrieve all keys within a given directory.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then
list_dir("a/")would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”.list_dir("b/")would return the empty set.
- class simplezarr.stores.WrapperStore(store: ReadableStore | WritableStore | ListableStore)
Bases:
ReadableStore,WritableStore,ListableStoreA store that wraps another store.
- get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]
Retrieve possibly partial values from given key_ranges.
The
key_rangesis an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.
- set_partial_values(key_start_values: list[tuple[str, int, bytes]])
Store values at a given key, starting at byte range_start.
- erase_prefix(prefix: str)
Erase all keys with the given prefix from the store.
The prefix represents a ‘directory’; it must end with a ‘/’.
- list_prefix(prefix: str) list[str]
Retrieve all keys with a given prefix.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.
For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then
list_prefix("a/")would return “a/b” and “a/c/d”.
- list_dir(prefix: str) list[str]
Retrieve all keys within a given directory.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then
list_dir("a/")would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”.list_dir("b/")would return the empty set.
- class simplezarr.stores.SlowStore(store: ReadableStore | WritableStore | ListableStore, base_delay: float = 1.0, bits_per_second: float = 0.0)
Bases:
WrapperStoreA store that has a fixed time delay for reads and writes.
Nodes
Zarr files are made up of a tree of nodes. Each node is either a ZarrGroup or a ZarrArray. The arrays are the leaf nodes.
- simplezarr.open_zarr(store: ReadableStore) ZarrNode
Open a zarr file using the given store.
- class simplezarr.ZarrNode(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)
Bases:
objectThe base class for
ZarrGroupandZarrArray.A zarr file is made up of nodes, where arrays are the lead nodes. Each node is represented by a ‘directory’ and a corresponding ‘zarr.json’ that contains information about the node.
- print_metadata()
Print a readable representation of the metadata.
- class simplezarr.ZarrGroup(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)
Bases:
ZarrNodeThe class that represents a group in a Zarr file.
The
repr()of a group shows its children. One can navigate the Zarr file by indexing:zarr_group[‘path/to/node’]
- class simplezarr.ZarrArray(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)
Bases:
ZarrNodeThe class that represents a Zarr array.
These arrays don’t contain any bytes themselves, but are used as proxies to load data from the store, and provide these as numpy arrays.
- property dtype: str
The datatype of the array.
Possible values include ‘bool’, ‘int8’, ‘int16’, ‘int32’, ‘int64’, ‘uint8’, ‘uint16’, ‘uint32’, ‘uint64’, ‘float16’, ‘float32’, ‘float64’, ‘complex64’, ‘complex128’, ‘rx’ (with x a multiple of 8).
- property ndim: int
The number of dimensions of the array (includes spatial, time, and channel dimensions).
- property chunks
Select a contiguous set of chunks.
Similar to
__getitem__, but the indices are coordinates in the chunk grid.Example:
arr.chunks[0, 0] # Select array for the first chunk arr.chunks[0, :] # Select array for first row of chunks
- __getitem__(selection) ZarrArraySlice
Select a slice from the array.
The returned
ZarrArraySlicecan be used to get and set the actual data.Examples:
# The lines below assume ndim=2 for sake of simplicity arr[...] # select the whole array arr[0, :] # select one row arr[:10, 100:800:5] # Slice, optionally with steps arr[10, 10] # select a scalar
- get_chunk_now(index) ndarray
Read a chunk from the store.
This function is synchronous; you may want to use
get_chunk_soon()to do the loading and decompression in a separate thread.Converts the index to the path for that chunk, load the bytes from the store, and decode them into a numpy array. This function is blocking (no threading or async).
- get_chunk_soon(index) Future[ndarray]
Read a chunk and return a
concurrent.futures.Future.Calls
get_chunk_now()in a separate thread (using aThreadPoolExecutor). One can wait for the result, and also combine multiple reads in parallel.This has little to do with async programming and asyncio, although the future-object can be converted to an awaitable using
asyncio.wrap_future().Example to wait for the data:
f = zarr_array.get_chunk_soon(...) data = f.result()
Combine multiple reads in parallel:
f1 = zarr_array.get_chunk_soon(...) f2 = zarr_array.get_chunk_soon(...) f3 = zarr_array.get_chunk_soon(...) data1, data2, data3 = [f.result() for f in [f1, f2, f3]]
Asynchronously await the data:
f = zarr_array.get_chunk_soon(...) data = await asyncio.wrap_future(f)
Async and parallel reads:
f1 = zarr_array.get_chunk_soon(...) f2 = zarr_array.get_chunk_soon(...) f3 = zarr_array.get_chunk_soon(...) asyncio_futures = [asyncio.wrap_future(f) for f in [f1, f2, f3]] data1, data2, data3 = await asyncio.gather(*asyncio_futures)
- set_chunk_now(index, data, check_empty=True) None
Write a chunk to the store.
Converts the index to the path for that chunk. Encodes the array to bytes, and save these to the store. This function is blocking (no threading or async).
- set_chunk_soon(index, data) Future[None]
Write a chunk and return a
concurrent.futures.Future.Calls
set_chunk_now()in a separate thread (using aThreadPoolExecutor). One can wait for the result, and also combine multiple writes in parallel.This has little to do with async programming and asyncio, although the future-object can be converted to an awaitable using
asyncio.wrap_future().Example to write and forget:
f = zarr_array.set_chunk_soon(...)
Combine multiple writes in parallel, and wait for them to finish:
f1 = zarr_array.set_chunk_soon(...) f2 = zarr_array.set_chunk_soon(...) f3 = zarr_array.set_chunk_soon(...) [f.result() for f in [f1, f2, f3]]
Asynchronously await the data:
f = zarr_array.set_chunk_soon(...) await asyncio.wrap_future(f)
Async and parallel reads:
f1 = zarr_array.set_chunk_soon(...) f2 = zarr_array.set_chunk_soon(...) f3 = zarr_array.set_chunk_soon(...) asyncio_futures = [asyncio.wrap_future(f) for f in [f1, f2, f3]] await asyncio.gather(*asyncio_futures)
Indexing
- class simplezarr.ZarrArraySlice(array: ZarrArray, selection: tuple)
Bases:
objectA slice of a ZarrArray that can be used to get and set data.
Usage:
# Select the 5th row vertically, and the first 100 columns horizontally sub = zarr_array[5, :100] # Now get or set to get the numpy array a = sub.get_now() # Short form a = zarr_array[5, :100].get_now()
- property shape: tuple[int, ...]
The shape of the sub array. Some dimensions can be collapsed, the array can even represent a scalar.
- get_soon() Future[ndarray]
Get the data for this ZarrArraySlice as a numpy array.
Returns a Future so the caller can wait for it in an appropriate way (e.g. wait for multiple gets in parallel).
- get_now() ndarray
Get the data for this ZarrArraySlice as a numpy array.
Blocks while waiting for the data to arrive. If the requested data consists of multiple chunks, these chunks are loaded in parallel.
- set_soon(value: float | ndarray) Future
Set the data for this ZarrArraySlice using a numpy array.
Returns a Future so the caller can wait in an appropriate for the write to finish. You could “fire and forget”, but then you don’t see any errors when the write fails. If you are in an async framework, you can async wait for it so you do see the error.