Architecture and Design Patterns¶
This page documents the design principles and extensibility patterns used throughout canVODpy.
Design Principles¶
-
Modularity
Independent packages with minimal coupling. Install only what you need; replace only what you must.
-
Extensibility
ABC + Factory pattern throughout. Add a custom reader, grid, or VOD algorithm in < 50 lines — no framework internals to understand.
-
Type Safety
Modern Python type hints + Pydantic validation at every boundary. Errors surface at construction time, not during analysis.
-
Scientific Focus
Explicit over implicit. Reproducible by default — every dataset is traceable to a config version, file hash, and Icechunk snapshot ID.
The Sollbruchstellen Principle¶
canVODpy applies the engineering concept of Sollbruchstellen (predetermined breaking points): packages are designed to be independent so they can be used separately or replaced without affecting the rest of the system.
Foundation (0 inter-package dependencies):
canvod-readers, canvod-grids, canvod-vod, canvod-utils,
canvod-virtualiconvname
Consumer (1–2 dependencies each):
canvod-auxiliary → canvod-readers
canvod-viz → canvod-grids
canvod-store → canvod-grids
canvod-store-metadata → canvod-utils
canvod-ops → canvod-grids, canvod-utils
Orchestration:
canvodpy → all packages
ABC + Factory Pattern¶
flowchart TD
subgraph ABCS["Abstract Base Classes"]
READER_ABC["`**GNSSDataReader**
to_ds, iter_epochs, file_hash`"]
GRID_ABC["`**BaseGridBuilder**
build`"]
VOD_ABC["`**VODCalculator**
calculate_vod`"]
end
subgraph FACTORIES["Factory Registry"]
RF["ReaderFactory"]
GF["GridFactory"]
VF["VODFactory"]
end
subgraph BUILTIN["Built-in"]
RINEX3["Rnxv3Obs"]
EA["EqualAreaBuilder"]
HP["HEALPixBuilder"]
TO["TauOmegaZerothOrder"]
end
subgraph CUSTOM["User Extension"]
IMPL["`**Custom class**
inherits ABC`"]
REG["Factory.register()"]
end
READER_ABC --> RF
GRID_ABC --> GF
VOD_ABC --> VF
RINEX3 --> RF
EA --> GF
HP --> GF
TO --> VF
IMPL --> REG --> RF
Registration + Usage¶
from pydantic import ConfigDict
from canvodpy import ReaderFactory
from canvod.readers import GNSSDataReader
from canvod.readers.builder import DatasetBuilder
class MyLabReader(GNSSDataReader):
"""GNSSDataReader is a Pydantic BaseModel + ABC — one parent is enough."""
model_config = ConfigDict(frozen=True)
# fpath is inherited from GNSSDataReader — no need to redeclare
def to_ds(self, keep_data_vars=None, **kwargs) -> xr.Dataset:
builder = DatasetBuilder(self)
for epoch in self.iter_epochs():
ei = builder.add_epoch(epoch.timestamp)
for obs in epoch.observations:
sig = builder.add_signal(sv=obs.sv, band=obs.band, code=obs.code)
builder.set_value(ei, sig, "SNR", obs.snr)
return builder.build(keep_data_vars=keep_data_vars)
# ... implement remaining abstract methods ...
# Register once (at import time)
ReaderFactory.register("mylab_v1", MyLabReader)
# Use anywhere
reader = ReaderFactory.create("mylab_v1", fpath=path)
ds = reader.to_ds()
Unified API Surface¶
canvodpy exposes four API levels — all backed by the same packages:
from canvodpy import process_date, calculate_vod
data = process_date("Rosalia", "2025001")
vod = calculate_vod("Rosalia", "canopy_01", "reference_01", "2025001")
Steps are recorded but not executed until a terminal method is called.
import canvodpy
result = (canvodpy.workflow("Rosalia")
.read("2025001")
.preprocess()
.grid("equal_area", angular_resolution=5.0)
.vod("canopy_01", "reference_01")
.result())
# Preview without executing
plan = canvodpy.workflow("Rosalia").read("2025001").preprocess().explain()
from canvodpy import VODWorkflow
wf = VODWorkflow(site="Rosalia", grid="equal_area")
datasets = wf.process_date("2025001")
vod = wf.calculate_vod("canopy_01", "reference_01", "2025001")
from canvod.readers import Rnxv3Obs
from canvod.grids import EqualAreaBuilder
from canvod.vod import TauOmegaZerothOrder
reader = Rnxv3Obs(fpath=Path("station.25o"))
ds = reader.to_ds(keep_data_vars=["SNR"])
Configuration Management¶
flowchart TD
subgraph FILES["YAML Files"]
PROC["processing.yaml"]
SITES["sites.yaml"]
SIDS["sids.yaml"]
DEF["Package defaults"]
end
subgraph LOAD["Loader"]
MERGE["`**Deep merge**
user overrides defaults`"]
PYDANTIC["Pydantic validation"]
end
subgraph VALIDATED["CanvodConfig"]
PC["ProcessingConfig"]
SC["SitesConfig"]
SIC["SidsConfig"]
end
PROC & SITES & SIDS & DEF --> MERGE --> PYDANTIC
PYDANTIC --> PC & SC & SIC
from canvod.utils.config import load_config
cfg = load_config()
cfg.processing.aux_data.nasa_earthdata_acc_mail
cfg.processing.storage.stores_root_dir
just config-init # Create config files from templates
just config-validate # Validate current config
just config-show # Print resolved config
Provenance and Reproducibility¶
Every dataset produced by canVODpy is fully traceable:
Full provenance chain
| Field | Source |
|---|---|
ds.attrs["File Hash"] |
SHA-256 of raw input file |
ds.attrs["Software"] |
canvod-readers x.y.z |
ds.attrs["Created"] |
ISO 8601 timestamp |
| Icechunk snapshot ID | Hash-addressable, immutable |
| Config version | Committed alongside code |
Airflow / Distributed Execution¶
Level 1 API functions are stateless and suitable for distributed scheduling:
from airflow.decorators import task
@task
def process_rinex_task(file_path: str, date: str) -> str:
from canvodpy import read_rinex
obs = read_rinex(file_path, date)
obs.to_zarr(f"/data/obs_{date}.zarr")
return f"/data/obs_{date}.zarr"
Factory registration happens at module import time — each worker process has access to all registered implementations automatically.