Skip to content
Back

Scalable Storage of Tensor Data for Scientific Computing

Legacy EOSS

2019

Proposal Summary

To establish Zarr as a foundation for scientific data storage, with clear data format and protocol specifications, implementations in multiple programming languages, and a community process for evolving to support new scientific applications.


Zarr

Zarr is a spec for storing chunked, compressed, N-dimensional arrays; several languages’ implementations of that spec, and an associated ecosystem of tools and integrations that use them. It is broadly used in biomedicine (malaria genomics, scRNAseq, spatial transcriptomics, neuroscience, etc.) and beyond. It fills a need for simple, scalable N-dimensional array storage in the cloud era. Scientists from many disciplines use it to solve a need historically filled by HDF5. It has a vibrant open ecosystem and distributed, grassroots developer and user base, and supports the NetCDF data model, allowing drop-in integration with a wide variety of systems and use cases.

Project Team

Ryan Williams Github

Mount Sinai School of Medicine

Alistair Miles Github

University of Oxford

John Kirkham Github

NVIDIA

Josh Moore Github

University of Dundee