canvod-store¶
Purpose¶
The canvod-store package provides versioned storage management for GNSS vegetation optical depth data using Icechunk — a cloud-native transactional format for multidimensional arrays built on Zarr v3.
-
Git-like versioning
Every write produces an Icechunk snapshot with a hash-addressable ID. Roll back to any earlier state, audit every append, and reproduce any result published from the store.
-
Cloud-native
S3-compatible backends (AWS, MinIO, Cloudflare R2). Local filesystem for development. Zero code change to switch.
-
Chunked time-series access
Default chunks:
epoch: 34560, sid: -1— one month of 1 Hz data per chunk. Zstd compression, O(1) epoch-range reads. -
Hash deduplication
SHA-256 of each raw RINEX file is stored as
"File Hash". Re-submitting the same file is always a no-op — safe to re-run pipelines.
Architecture¶
graph TD
A1["`**GNSS Data (RINEX / SBF)**
epoch x sid`"]
A1 --> B["`**Preprocessing**
encoding, padding`"]
B --> C["Icechunk Repository"]
C --> D1["`**obs group**
receiver/obs/
epoch x sid`"]
D1 --> E["VOD Analysis"]
Core Components¶
from canvod.store import MyIcechunkStore
store = MyIcechunkStore(store_path, strategy="append")
store.write(dataset)
from canvod.store import IcechunkDataReader
reader = IcechunkDataReader(store_path)
ds = reader.read(time_range=("2024-01-01", "2024-12-31"))
from canvodpy import Site
site = Site("Rosalia")
site.rinex_store.list_groups() # ["canopy_01", "reference_01"]
site.rinex_store.get_group_info("canopy_01")
Storage Layout¶
{receiver_name}/
└── obs/ # Observations (epoch × sid)
├── SNR
├── Phase
├── Pseudorange
├── Doppler
└── ...
Data Flow¶
- Ingest — Raw GNSS data (RINEX via
Rnxv3Obsor SBF viaSbfReader) + ephemerides - Preprocess — Normalise encodings, pad to global SID, strip fill values
- Store observations — Append to
{group}/obs/with"File Hash"deduplication - Query — Retrieve by time range, signal, or group name
- Analyse — VOD calculation using stored observations and grid geometry
Storage Format¶
| Property | Value |
|---|---|
| Backend format | Icechunk (Zarr v3) |
| Default chunks | epoch: 34560, sid: -1 |
| Compression | Zstd level 5 |
| Cloud backends | S3, MinIO, R2, local filesystem |
| Versioning | Git-like snapshots, hash-addressable |
| Deduplication | SHA-256 "File Hash" per RINEX file |