API
The lightweigth core of simplezarr.
Stores
Zarr data is stored as a set of key-value pairs. This can be a directory with files on your hard-drive. Or a Python dict in memory, or a remote resource accessible over the internet.
Stores give access to that data in a consistent way, so that the code that reads/writes the Zarr data does not have to care how/where the data is stored. Multiple implementations are provided. But also wrapper stores for various purposes.
- class simplezarr.stores.ReadableStore
Bases:
BaseStoreA store that can read keys.
Partial getting is implemented in this base class by using
.get(), and then taking a slice.
- class simplezarr.stores.WritableStore
Bases:
BaseStoreA store that can write and delete keys.
Partial setting is implemented in this base class by using
.get(), updating the value, and then.set(). Similarly,erase_values()anderase_prefix()are implemented in the base class; they can be overridden if a subclass can implement it more efficiently.
- class simplezarr.stores.ListableStore
Bases:
BaseStoreA store that can list keys.
Although
list_prefix()andlist_dir()are implemented in this base class, subclasses can likely implement them more efficiently.- list_prefix(prefix: str) list[str]
Retrieve all keys with a given prefix.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.
For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then
list_prefix("a/")would return “a/b” and “a/c/d”.
- list_dir(prefix: str) list[str]
Retrieve all keys within a given directory.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then
list_dir("a/")would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”.list_dir("b/")would return the empty set.
- class simplezarr.stores.MemoryStore(fields: dict | None = None)
Bases:
ReadableStore,WritableStore,ListableStoreImplementation of a readable, writable and listable store, based on an in-memory dict.
- class simplezarr.stores.LocalStore(path: str | Path)
Bases:
ReadableStore,WritableStore,ListableStoreImplementation of a readable, writable and listable store, based on the local file system.
The given path represents the root of the store.
- get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]
Retrieve possibly partial values from given key_ranges.
The
key_rangesis an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.
- set_partial_values(key_start_values: list[tuple[str, int, bytes]])
Store values at a given key, starting at byte range_start.
- erase_prefix(prefix: str)
Erase all keys with the given prefix from the store.
The prefix represents a ‘directory’; it must end with a ‘/’.
- list_prefix(prefix: str) list[str]
Retrieve all keys with a given prefix.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.
For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then
list_prefix("a/")would return “a/b” and “a/c/d”.
- list_dir(prefix: str) list[str]
Retrieve all keys within a given directory.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then
list_dir("a/")would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”.list_dir("b/")would return the empty set.
- class simplezarr.stores.WrapperStore(store: ReadableStore | WritableStore | ListableStore)
Bases:
ReadableStore,WritableStore,ListableStoreA store that wraps another store.
- get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]
Retrieve possibly partial values from given key_ranges.
The
key_rangesis an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.
- set_partial_values(key_start_values: list[tuple[str, int, bytes]])
Store values at a given key, starting at byte range_start.
- erase_prefix(prefix: str)
Erase all keys with the given prefix from the store.
The prefix represents a ‘directory’; it must end with a ‘/’.
- list_prefix(prefix: str) list[str]
Retrieve all keys with a given prefix.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.
For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then
list_prefix("a/")would return “a/b” and “a/c/d”.
- list_dir(prefix: str) list[str]
Retrieve all keys within a given directory.
The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then
list_dir("a/")would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”.list_dir("b/")would return the empty set.
- class simplezarr.stores.SlowStore(store: ReadableStore | WritableStore | ListableStore, base_delay: float = 1.0, bits_per_second: float = 0.0)
Bases:
WrapperStoreA store that has a fixed time delay for reads and writes.
Nodes
Zarr files are made up of a tree of nodes. Each node is either a ZarrGroup or a ZarrArray. The arrays are the leaf nodes.
- simplezarr.nodes.open_zarr(store: ReadableStore) ZarrNode
Open a zarr file using the given store.
- class simplezarr.nodes.ZarrNode(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)
Bases:
objectThe base class for
ZarrGroupandZarrArray.A zarr file is made up of nodes, where arrays are the lead nodes. Each node is represented by a ‘directory’ and a corresponding ‘zarr.json’ that contains information about the node.
- print_metadata()
Print a readable representation of the metadata.
- class simplezarr.nodes.ZarrGroup(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)
Bases:
ZarrNodeThe class that represents a group in a Zarr file.
The
repr()of a group shows its children. One can navigate the Zarr file by indexing:zarr_group[‘path/to/node’]
- class simplezarr.nodes.ZarrArray(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)
Bases:
ZarrNodeThe class that represents a Zarr array.
These arrays don’t contain any bytes themselves, but are used as proxies to load data from the store, and provide these as numpy arrays.
- property dtype: str
The datatype of the array.
Possible values include ‘bool’, ‘int8’, ‘int16’, ‘int32’, ‘int64’, ‘uint8’, ‘uint16’, ‘uint32’, ‘uint64’, ‘float16’, ‘float32’, ‘float64’, ‘complex64’, ‘complex128’, ‘rx’ (with x a multiple of 8).
- property ndim: int
The number of dimensions of the array (includes spatial, time, and channel dimensions).
- get_chunk(index) ndarray
Read a chunk from the store.
This function is synchronous; you may want to use
get_chunk_future()to do the loading and decompression in a separate thread.Converts the index to the path for that chunk, load the bytes from the store, and decode them into a numpy array. This function is blocking (no threading or async).
- get_chunk_future(index) Future[ndarray]
Read a chunk and return a
concurrent.futures.Future.The loading happens in a separate thread (using a
ThreadPoolExecutor). One can wait for the result, and also combine multiple reads in parallel.This has little to do with async programming and asyncio, although the future-object can be converted to an awaitable using
asyncio.wrap_future().Example to wait for the data:
f = zarr_array.get_chunk_future(...) data = f.result()
Combine multiple reads in parallel:
f1 = zarr_array.get_chunk_future(...) f2 = zarr_array.get_chunk_future(...) f3 = zarr_array.get_chunk_future(...) data1, data2, data3 = [f.result() for f in [f1, f2, f3]]
Asynchronously await the data:
f = zarr_array.get_chunk_future(...) data = await asyncio.wrap_future(f)
Async and parallel reads:
f1 = zarr_array.get_chunk_future(...) f2 = zarr_array.get_chunk_future(...) f3 = zarr_array.get_chunk_future(...) asyncio_futures = [asyncio.wrap_future(f) for f in [f1, f2, f3]] data1, data2, data3 = await asyncio.gather(*asyncio_futures)
- set_chunk(data, index, check_empty=True) None
Write a chunk to the store.
Converts the index to the path for that chunk. Encodes the array to bytes, and save these to the store. This function is blocking (no threading or async).
- set_chunk_future(data, index) Future[None]
Write a chunk and return a
concurrent.futures.Future.The writing happens in a separate thread (using a
ThreadPoolExecutor). One can wait for the result, and also combine multiple writes in parallel.This has little to do with async programming and asyncio, although the future-object can be converted to an awaitable using
asyncio.wrap_future().Example to write and forget:
f = zarr_array.set_chunk_future(...)
Combine multiple writes in parallel, and wait for them to finish:
f1 = zarr_array.set_chunk_future(...) f2 = zarr_array.set_chunk_future(...) f3 = zarr_array.set_chunk_future(...) [f.result() for f in [f1, f2, f3]]
Asynchronously await the data:
f = zarr_array.set_chunk_future(...) await asyncio.wrap_future(f)
Async and parallel reads:
f1 = zarr_array.set_chunk_future(...) f2 = zarr_array.set_chunk_future(...) f3 = zarr_array.set_chunk_future(...) asyncio_futures = [asyncio.wrap_future(f) for f in [f1, f2, f3]] await asyncio.gather(*asyncio_futures)