Skip to content

Architecture and Design Patterns

This page documents the design principles and extensibility patterns used throughout canVODpy.


Design Principles

  •   Modularity


    Independent packages with minimal coupling. Install only what you need; replace only what you must.

  •   Extensibility


    ABC + Factory pattern throughout. Add a custom reader, grid, or VOD algorithm in < 50 lines — no framework internals to understand.

  •   Type Safety


    Modern Python type hints + Pydantic validation at every boundary. Errors surface at construction time, not during analysis.

  •   Scientific Focus


    Explicit over implicit. Reproducible by default — every dataset is traceable to a config version, file hash, and Icechunk snapshot ID.


The Sollbruchstellen Principle

canVODpy applies the engineering concept of Sollbruchstellen (predetermined breaking points): packages are designed to be independent so they can be used separately or replaced without affecting the rest of the system.

Foundation (0 inter-package dependencies):
  canvod-readers, canvod-grids, canvod-vod, canvod-utils,
  canvod-virtualiconvname

Consumer (1–2 dependencies each):
  canvod-auxiliary      → canvod-readers
  canvod-viz            → canvod-grids
  canvod-store          → canvod-grids
  canvod-store-metadata → canvod-utils
  canvod-ops            → canvod-grids, canvod-utils

Orchestration:
  canvodpy              → all packages

ABC + Factory Pattern

flowchart TD
    subgraph ABCS["Abstract Base Classes"]
        READER_ABC["`**GNSSDataReader**
        to_ds, iter_epochs, file_hash`"]
        GRID_ABC["`**BaseGridBuilder**
        build`"]
        VOD_ABC["`**VODCalculator**
        calculate_vod`"]
    end

    subgraph FACTORIES["Factory Registry"]
        RF["ReaderFactory"]
        GF["GridFactory"]
        VF["VODFactory"]
    end

    subgraph BUILTIN["Built-in"]
        RINEX3["Rnxv3Obs"]
        EA["EqualAreaBuilder"]
        HP["HEALPixBuilder"]
        TO["TauOmegaZerothOrder"]
    end

    subgraph CUSTOM["User Extension"]
        IMPL["`**Custom class**
        inherits ABC`"]
        REG["Factory.register()"]
    end

    READER_ABC --> RF
    GRID_ABC   --> GF
    VOD_ABC    --> VF

    RINEX3 --> RF
    EA     --> GF
    HP     --> GF
    TO     --> VF

    IMPL --> REG --> RF

Registration + Usage

from pydantic import ConfigDict
from canvodpy import ReaderFactory
from canvod.readers import GNSSDataReader
from canvod.readers.builder import DatasetBuilder

class MyLabReader(GNSSDataReader):
    """GNSSDataReader is a Pydantic BaseModel + ABC — one parent is enough."""

    model_config = ConfigDict(frozen=True)
    # fpath is inherited from GNSSDataReader — no need to redeclare

    def to_ds(self, keep_data_vars=None, **kwargs) -> xr.Dataset:
        builder = DatasetBuilder(self)
        for epoch in self.iter_epochs():
            ei = builder.add_epoch(epoch.timestamp)
            for obs in epoch.observations:
                sig = builder.add_signal(sv=obs.sv, band=obs.band, code=obs.code)
                builder.set_value(ei, sig, "SNR", obs.snr)
        return builder.build(keep_data_vars=keep_data_vars)

    # ... implement remaining abstract methods ...

# Register once (at import time)
ReaderFactory.register("mylab_v1", MyLabReader)

# Use anywhere
reader = ReaderFactory.create("mylab_v1", fpath=path)
ds = reader.to_ds()

Unified API Surface

canvodpy exposes four API levels — all backed by the same packages:

from canvodpy import process_date, calculate_vod

data = process_date("Rosalia", "2025001")
vod  = calculate_vod("Rosalia", "canopy_01", "reference_01", "2025001")

Steps are recorded but not executed until a terminal method is called.

import canvodpy

result = (canvodpy.workflow("Rosalia")
    .read("2025001")
    .preprocess()
    .grid("equal_area", angular_resolution=5.0)
    .vod("canopy_01", "reference_01")
    .result())

# Preview without executing
plan = canvodpy.workflow("Rosalia").read("2025001").preprocess().explain()
from canvodpy import VODWorkflow

wf       = VODWorkflow(site="Rosalia", grid="equal_area")
datasets = wf.process_date("2025001")
vod      = wf.calculate_vod("canopy_01", "reference_01", "2025001")
from canvod.readers import Rnxv3Obs
from canvod.grids   import EqualAreaBuilder
from canvod.vod     import TauOmegaZerothOrder

reader = Rnxv3Obs(fpath=Path("station.25o"))
ds = reader.to_ds(keep_data_vars=["SNR"])

Configuration Management

flowchart TD
    subgraph FILES["YAML Files"]
        PROC["processing.yaml"]
        SITES["sites.yaml"]
        SIDS["sids.yaml"]
        DEF["Package defaults"]
    end

    subgraph LOAD["Loader"]
        MERGE["`**Deep merge**
        user overrides defaults`"]
        PYDANTIC["Pydantic validation"]
    end

    subgraph VALIDATED["CanvodConfig"]
        PC["ProcessingConfig"]
        SC["SitesConfig"]
        SIC["SidsConfig"]
    end

    PROC & SITES & SIDS & DEF --> MERGE --> PYDANTIC
    PYDANTIC --> PC & SC & SIC
from canvod.utils.config import load_config

cfg = load_config()
cfg.processing.aux_data.nasa_earthdata_acc_mail
cfg.processing.storage.stores_root_dir
just config-init      # Create config files from templates
just config-validate  # Validate current config
just config-show      # Print resolved config

Provenance and Reproducibility

Every dataset produced by canVODpy is fully traceable:

Full provenance chain

Field Source
ds.attrs["File Hash"] SHA-256 of raw input file
ds.attrs["Software"] canvod-readers x.y.z
ds.attrs["Created"] ISO 8601 timestamp
Icechunk snapshot ID Hash-addressable, immutable
Config version Committed alongside code

Airflow / Distributed Execution

Level 1 API functions are stateless and suitable for distributed scheduling:

from airflow.decorators import task

@task
def process_rinex_task(file_path: str, date: str) -> str:
    from canvodpy import read_rinex
    obs = read_rinex(file_path, date)
    obs.to_zarr(f"/data/obs_{date}.zarr")
    return f"/data/obs_{date}.zarr"

Factory registration happens at module import time — each worker process has access to all registered implementations automatically.