Storage Strategies¶
The write strategy controls how MyIcechunkStore handles the time range already in the store. Choose once per store type — set it in processing.yaml.
-
Skip
No-op if any data already exists for the incoming time range. Safe for immutable raw observations — never overwrites.
Best for: initial ingestion, pipeline restarts
-
Overwrite
Deletes existing data for the time range, then writes fresh. Each run produces a new Icechunk snapshot for audit.
Best for: reprocessed results, algorithm updates
-
Append
Merges new data with existing, extending the time series. Handles overlapping epochs by keeping existing values.
Best for: continuous monitoring, daily live ingestion
Behaviour Reference¶
| Strategy | Data exists | Data missing | Version snapshot | Speed |
|---|---|---|---|---|
skip |
No write | Write | On write | Fast |
overwrite |
Delete + write | Write | Always | Medium |
append |
Merge | Write | Always | Slower |
Usage¶
from canvod.store import MyIcechunkStore
# Raw observations — skip if already ingested
rinex_store = MyIcechunkStore(
"/data/stores/rosalia/rinex",
strategy="skip",
)
# Processed VOD — rewrite on algorithm update
vod_store = MyIcechunkStore(
"/data/stores/rosalia/vod",
strategy="overwrite",
)
# Monitoring — extend daily
live_store = MyIcechunkStore(
"/data/stores/live/rinex",
strategy="append",
)
storage:
rinex_store_strategy: skip # raw observations are immutable
vod_store_strategy: overwrite # recompute as algorithms improve
The Site object reads these keys automatically:
from canvod.site import Site
site = Site("Rosalia") # strategy from config
site.rinex_store.strategy # → "skip"
site.vod_store.strategy # → "overwrite"
Recommended Defaults¶
Raw RINEX observations → skip
Raw GNSS data never changes after collection. Skip prevents accidental re-ingestion and keeps ingest pipelines idempotent — safe to restart at any point.
Processed VOD products → overwrite
As the tau-omega inversion improves or auxiliary data quality changes, re-running the pipeline should replace old values. Each overwrite creates a new Icechunk snapshot so you can compare before/after.
Continuous monitoring → append
Use append only when truly extending a live time series.
It is slower and may produce unexpected results if a day is
partially processed and re-submitted.
Performance¶
| Strategy | Typical write throughput | Storage overhead | Re-ingest safety |
|---|---|---|---|
skip |
Fastest — hash check only | None | Safe |
overwrite |
Moderate — delete + write | Low (old chunks GC'd) | Safe |
append |
Slowest — read-merge-write | Higher (old + new chunks) | Risky |
Garbage collection
Overwritten chunks remain in the Icechunk object store until you run GC. The old versions are still accessible via snapshot IDs — useful for auditing before cleaning up.