API

The lightweigth core of simplezarr.

Stores

Zarr data is stored as a set of key-value pairs. This can be a directory with files on your hard-drive. Or a Python dict in memory, or a remote resource accessible over the internet.

Stores give access to that data in a consistent way, so that the code that reads/writes the Zarr data does not have to care how/where the data is stored. Multiple implementations are provided. But also wrapper stores for various purposes.

class simplezarr.stores.BaseStore

Bases: object

The base store class.

class simplezarr.stores.ReadableStore

Bases: BaseStore

A store that can read keys.

Partial getting is implemented in this base class by using .get(), and then taking a slice.

get(key: str) bytes

Retrieve the value associated with a given key.

get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]

Retrieve possibly partial values from given key_ranges.

The key_ranges is an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.

class simplezarr.stores.WritableStore

Bases: BaseStore

A store that can write and delete keys.

Partial setting is implemented in this base class by using .get(), updating the value, and then .set(). Similarly, erase_values() and erase_prefix() are implemented in the base class; they can be overridden if a subclass can implement it more efficiently.

set(key: str, value: bytes)

Store a (key, value) pair.

set_partial_values(key_start_values: list[tuple[str, int, bytes]])

Store values at a given key, starting at byte range_start.

erase(key: str)

Erase the given key/value pair from the store.

erase_values(keys: list[str])

Erase the given key/value pairs from the store.

erase_prefix(prefix: str)

Erase all keys with the given prefix from the store.

The prefix represents a ‘directory’; it must end with a ‘/’.

class simplezarr.stores.ListableStore

Bases: BaseStore

A store that can list keys.

Although list_prefix() and list_dir() are implemented in this base class, subclasses can likely implement them more efficiently.

list() list[str]

Retrieve all keys in the store.

list_prefix(prefix: str) list[str]

Retrieve all keys with a given prefix.

The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.

For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then list_prefix("a/") would return “a/b” and “a/c/d”.

list_dir(prefix: str) list[str]

Retrieve all keys within a given directory.

The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.

For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then list_dir("a/") would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”. list_dir("b/") would return the empty set.

class simplezarr.stores.MemoryStore(fields: dict | None = None)

Bases: ReadableStore, WritableStore, ListableStore

Implementation of a readable, writable and listable store, based on an in-memory dict.

get(key: str) bytes

Retrieve the value associated with a given key.

set(key: str, value: bytes)

Store a (key, value) pair.

erase(key: str)

Erase the given key/value pair from the store.

list() list[str]

Retrieve all keys in the store.

class simplezarr.stores.LocalStore(path: str | Path)

Bases: ReadableStore, WritableStore, ListableStore

Implementation of a readable, writable and listable store, based on the local file system.

The given path represents the root of the store.

get(key: str) bytes

Retrieve the value associated with a given key.

get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]

Retrieve possibly partial values from given key_ranges.

The key_ranges is an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.

set(key: str, value: bytes)

Store a (key, value) pair.

set_partial_values(key_start_values: list[tuple[str, int, bytes]])

Store values at a given key, starting at byte range_start.

erase(key: str)

Erase the given key/value pair from the store.

erase_values(keys: list[str])

Erase the given key/value pairs from the store.

erase_prefix(prefix: str)

Erase all keys with the given prefix from the store.

The prefix represents a ‘directory’; it must end with a ‘/’.

list() list[str]

Retrieve all keys in the store.

list_prefix(prefix: str) list[str]

Retrieve all keys with a given prefix.

The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.

For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then list_prefix("a/") would return “a/b” and “a/c/d”.

list_dir(prefix: str) list[str]

Retrieve all keys within a given directory.

The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.

For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then list_dir("a/") would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”. list_dir("b/") would return the empty set.

class simplezarr.stores.WrapperStore(store: ReadableStore | WritableStore | ListableStore)

Bases: ReadableStore, WritableStore, ListableStore

A store that wraps another store.

get(key: str) bytes

Retrieve the value associated with a given key.

get_partial_values(key_ranges: list[tuple[str, int, int | None]]) list[bytes]

Retrieve possibly partial values from given key_ranges.

The key_ranges is an iterable of (key, range_start, range_length), where range_length may be None to indicate the full remaining length.

set(key: str, value: bytes)

Store a (key, value) pair.

set_partial_values(key_start_values: list[tuple[str, int, bytes]])

Store values at a given key, starting at byte range_start.

erase(key: str)

Erase the given key/value pair from the store.

erase_values(keys: list[str])

Erase the given key/value pairs from the store.

erase_prefix(prefix: str)

Erase all keys with the given prefix from the store.

The prefix represents a ‘directory’; it must end with a ‘/’.

list() list[str]

Retrieve all keys in the store.

list_prefix(prefix: str) list[str]

Retrieve all keys with a given prefix.

The prefix represents a ‘directory’; it must end with a ‘/’. This method lists the full (recursive) list of items in that directory.

For example, if a store contains the keys “a/b”, “a/c/d” and “e/f/g”, then list_prefix("a/") would return “a/b” and “a/c/d”.

list_dir(prefix: str) list[str]

Retrieve all keys within a given directory.

The prefix represents a ‘directory’; it must end with a ‘/’. This method lists only the keys in that directory and not in that of any subdirectories. But it does return prefixes (i.e. directories) within the given directory.

For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then list_dir("a/") would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”. list_dir("b/") would return the empty set.

class simplezarr.stores.SlowStore(store: ReadableStore | WritableStore | ListableStore, base_delay: float = 1.0, bits_per_second: float = 0.0)

Bases: WrapperStore

A store that has a fixed time delay for reads and writes.

Nodes

Zarr files are made up of a tree of nodes. Each node is either a ZarrGroup or a ZarrArray. The arrays are the leaf nodes.

simplezarr.nodes.open_zarr(store: ReadableStore) ZarrNode

Open a zarr file using the given store.

class simplezarr.nodes.ZarrNode(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)

Bases: object

The base class for ZarrGroup and ZarrArray.

A zarr file is made up of nodes, where arrays are the lead nodes. Each node is represented by a ‘directory’ and a corresponding ‘zarr.json’ that contains information about the node.

property store: BaseStore

The store for this node.

property name: str

The name of this node.

property path: str

The full path of this node in the store.

property metadata: dict

The metadata as a dictionary.

print_metadata()

Print a readable representation of the metadata.

class simplezarr.nodes.ZarrGroup(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)

Bases: ZarrNode

The class that represents a group in a Zarr file.

The repr() of a group shows its children. One can navigate the Zarr file by indexing:

zarr_group[‘path/to/node’]

property children: tuple[ZarrNode]

The child nodes of this group. These can be groups or arrays.

property attributes: dict

The attributes of this group. I.e. metadata["attributes"]

print_structure(max_depth: int = 999)

Print the structure of the Zarr file from this group and below.

get_structure(max_depth: int = 999, indent: int = 0) str

Get the structure of this group as a human-readble string.

class simplezarr.nodes.ZarrArray(store: ReadableStore | ListableStore | WritableStore, path: str, metadata: dict | None = None)

Bases: ZarrNode

The class that represents a Zarr array.

These arrays don’t contain any bytes themselves, but are used as proxies to load data from the store, and provide these as numpy arrays.

property dtype: str

The datatype of the array.

Possible values include ‘bool’, ‘int8’, ‘int16’, ‘int32’, ‘int64’, ‘uint8’, ‘uint16’, ‘uint32’, ‘uint64’, ‘float16’, ‘float32’, ‘float64’, ‘complex64’, ‘complex128’, ‘rx’ (with x a multiple of 8).

property ndim: int

The number of dimensions of the array (includes spatial, time, and channel dimensions).

property shape: tuple[int, ...]

The shape of the array (ndim elements).

property size: int

The size of the array, expressed in number of elements.

property nbytes: int

The size of the array in bytes (uncompressed).

property chunk_grid_shape: tuple[int, ...]

The shape of the chunk grid (ndim elements).

property chunk_shape: tuple[int, ...]

The shape of each chunk (ndim elements).

property chunk_size: int

The size of each chunk, in number of elements.

property chunk_nbytes: int

The size of each chunk in (uncompressed) bytes.

get_chunk(index) ndarray

Read a chunk from the store.

This function is synchronous; you may want to use get_chunk_future() to do the loading and decompression in a separate thread.

Converts the index to the path for that chunk, load the bytes from the store, and decode them into a numpy array. This function is blocking (no threading or async).

get_chunk_future(index) Future[ndarray]

Read a chunk and return a concurrent.futures.Future.

The loading happens in a separate thread (using a ThreadPoolExecutor). One can wait for the result, and also combine multiple reads in parallel.

This has little to do with async programming and asyncio, although the future-object can be converted to an awaitable using asyncio.wrap_future().

Example to wait for the data:

f = zarr_array.get_chunk_future(...)
data = f.result()

Combine multiple reads in parallel:

f1 = zarr_array.get_chunk_future(...)
f2 = zarr_array.get_chunk_future(...)
f3 = zarr_array.get_chunk_future(...)

data1, data2, data3 = [f.result() for f in [f1, f2, f3]]

Asynchronously await the data:

f = zarr_array.get_chunk_future(...)
data = await asyncio.wrap_future(f)

Async and parallel reads:

f1 = zarr_array.get_chunk_future(...)
f2 = zarr_array.get_chunk_future(...)
f3 = zarr_array.get_chunk_future(...)

asyncio_futures = [asyncio.wrap_future(f) for f in [f1, f2, f3]]
data1, data2, data3 = await asyncio.gather(*asyncio_futures)
set_chunk(data, index, check_empty=True) None

Write a chunk to the store.

Converts the index to the path for that chunk. Encodes the array to bytes, and save these to the store. This function is blocking (no threading or async).

set_chunk_future(data, index) Future[None]

Write a chunk and return a concurrent.futures.Future.

The writing happens in a separate thread (using a ThreadPoolExecutor). One can wait for the result, and also combine multiple writes in parallel.

This has little to do with async programming and asyncio, although the future-object can be converted to an awaitable using asyncio.wrap_future().

Example to write and forget:

f = zarr_array.set_chunk_future(...)

Combine multiple writes in parallel, and wait for them to finish:

f1 = zarr_array.set_chunk_future(...)
f2 = zarr_array.set_chunk_future(...)
f3 = zarr_array.set_chunk_future(...)

[f.result() for f in [f1, f2, f3]]

Asynchronously await the data:

f = zarr_array.set_chunk_future(...)
await asyncio.wrap_future(f)

Async and parallel reads:

f1 = zarr_array.set_chunk_future(...)
f2 = zarr_array.set_chunk_future(...)
f3 = zarr_array.set_chunk_future(...)

asyncio_futures = [asyncio.wrap_future(f) for f in [f1, f2, f3]]
await asyncio.gather(*asyncio_futures)