A command-line tool for exploring huge AnnData stores (.h5ad and .zarr) without loading them fully into memory. Streams data directly from disk for efficient inspection of structure, metadata, and matrices.
- Streaming access to very large
.h5adand.zarrstores - Auto-detects
.h5adfiles vs.zarrdirectories - Chunked processing for dense and sparse matrices (CSR/CSC)
- Rich terminal output with progress indicators
Using uv (recommended):
git clone https://github.com/cellgeni/h5ad-cli.git
cd h5ad-cli
uv syncFor development and testing:
uv sync --extra devAlternative with pip:
git clone https://github.com/cellgeni/h5ad-cli.git
cd h5ad-cli
pip install .For development and testing with pip:
pip install -e ".[dev]"See docs/TESTING.md for testing documentation.
Run help at any level (e.g. uv run h5ad --help, uv run h5ad export --help).
info– read-only inspection of store layout, shapes, and type hints; supports drilling into paths likeobsm/X_pcaoruns.subset– stream and write a filtered copy based on obs/var name lists, preserving dense and sparse matrix encodings.export– extract data from a store; subcommands:dataframe(obs/var to CSV),array(dense to.npy),sparse(CSR/CSC to.mtx),dict(JSON),image(PNG).import– write new data into a store; subcommands:dataframe(CSV → obs/var),array(.npy),sparse(.mtx),dict(JSON).
See docs/GET_STARTED.md for a short tutorial.
A docker image is available on QUAY: quay.io/cellgeni/h5ad-cli:latest. Pull and run with:
docker run --rm -it -v /path/to/data:/data quay.io/cellgeni/h5ad-cli:latest h5ad info /data/your_file.h5ad