Reader Architecture¶
This page describes the architectural principles behind canvod-readers, with particular focus on the GNSSDataReader base class — a Pydantic BaseModel combined with Python's Abstract Base Class (ABC) pattern — which ensures extensibility, type safety, and consistency across reader implementations.
The GNSSDataReader Base Class¶
GNSSDataReader inherits from both pydantic.BaseModel and abc.ABC. This combination provides:
- Contract enforcement — all readers must implement required abstract methods
- Automatic validation — file path existence is checked at construction time via Pydantic
- Type safety — for static type checkers and runtime validation
- Simplified inheritance — subclasses inherit
fpath, file validation, andmodel_configfor free - Self-documenting interface — abstract methods define the exact contract
The class is located in canvod/readers/base.py:
from abc import ABC, abstractmethod
from pathlib import Path
from pydantic import BaseModel, ConfigDict, field_validator
import xarray as xr
class GNSSDataReader(BaseModel, ABC):
"""Abstract base for all GNSS data format readers.
All readers must:
1. Inherit from this class (no need for separate BaseModel)
2. Implement all abstract methods
3. Return xarray.Dataset that passes validate_dataset()
4. Provide file hash for deduplication
"""
model_config = ConfigDict(arbitrary_types_allowed=True)
fpath: Path # Validated at construction time
@field_validator("fpath")
@classmethod
def _validate_fpath(cls, v: Path) -> Path:
v = Path(v)
if not v.is_file():
raise FileNotFoundError(f"File not found: {v}")
return v
@property
@abstractmethod
def file_hash(self) -> str:
"""SHA256 hash of file for deduplication."""
@abstractmethod
def to_ds(
self,
keep_data_vars: list[str] | None = None,
**kwargs
) -> xr.Dataset:
"""Convert data to xarray.Dataset."""
@abstractmethod
def iter_epochs(self):
"""Iterate over epochs in file."""
@property
@abstractmethod
def start_time(self) -> datetime:
"""Start time of observations."""
@property
@abstractmethod
def end_time(self) -> datetime:
"""End time of observations."""
@property
@abstractmethod
def systems(self) -> list[str]:
"""GNSS systems in file."""
@property
@abstractmethod
def num_satellites(self) -> int:
"""Number of unique satellites."""
def to_ds_and_auxiliary(
self, **kwargs
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
"""Obs dataset + any auxiliary datasets. Default: empty aux dict."""
return self.to_ds(**kwargs), {}
Why BaseModel + ABC?
Before this design, every reader needed triple inheritance
(GNSSDataReader, BaseModel), a duplicate fpath: Path field,
model_config = ConfigDict(arbitrary_types_allowed=True), and
its own fpath validator. Merging these into the base class
means a new reader is just:
class MyReader(GNSSDataReader):
model_config = ConfigDict(frozen=True)
# fpath is inherited — no need to redeclare
Contract Guarantees¶
Any implementation of GNSSDataReader guarantees the following:
- File Validation:
fpathis validated at construction time — aFileNotFoundErroris raised if the file does not exist. - Hash Computation:
file_hashreturns a deterministic identifier for storage deduplication. - Dataset Conversion:
to_ds()returns a validated xarray.Dataset with(epoch, sid)dimensions. - Iteration:
iter_epochs()yields epoch-by-epoch data for memory-bounded streaming. - Metadata: Properties provide time range, systems, and satellite counts.
- Validation: Output passes
validate_dataset()— checked automatically byDatasetBuilder.
SignalID — Validated Signal Identifiers¶
Signal identifiers ("G01|L1|C") are the backbone of the sid dimension. Instead of building them as raw f-strings, the SignalID Pydantic model validates each component at creation time:
from canvod.readers.base import SignalID
# Create from components — validated immediately
sig = SignalID(sv="G01", band="L1", code="C")
sig.sid # → "G01|L1|C"
sig.system # → "G"
str(sig) # → "G01|L1|C"
# Parse from string
sig2 = SignalID.from_string("E25|E5a|I")
# Invalid SVs are rejected at construction
SignalID(sv="X01", band="L1", code="C") # raises ValueError
SignalID is a frozen Pydantic model — immutable and hashable. It validates the SV against SV_PATTERN (system letter G|R|E|C|J|S|I + 2-digit PRN).
DatasetBuilder — Guided Dataset Construction¶
The DatasetBuilder helper eliminates the ~30 lines of manual numpy/xarray coordinate assembly that every reader previously needed:
from canvod.readers.builder import DatasetBuilder
builder = DatasetBuilder(reader)
for epoch in reader.iter_epochs():
ei = builder.add_epoch(epoch.timestamp)
for obs in epoch.observations:
sig = builder.add_signal(sv=obs.sv, band=obs.band, code=obs.code)
builder.set_value(ei, sig, "SNR", obs.snr)
ds = builder.build() # validated Dataset
The builder handles:
- Coordinate arrays —
sid,sv,system,band,codefrom registered signals - Frequency resolution —
freq_center,freq_min,freq_maxviaSignalIDMapper - Dtype enforcement —
float32for frequencies, correct dtypes for each variable - CF-compliant metadata — from
COORDS_METADATA,SNR_METADATA,OBSERVABLES_METADATA - Global attributes — from
reader._build_attrs()+ optionalextra_attrs - Validation — calls
validate_dataset()before returning
Layered Architecture¶
graph TB
subgraph "User Layer"
A[User Code]
end
subgraph "Interface Layer"
B["`**GNSSDataReader**
BaseModel + ABC`"]
B2[SignalID]
C[validate_dataset]
end
subgraph "Builder Layer"
BLD[DatasetBuilder]
end
subgraph "Implementation Layer"
D[Rnxv3Obs]
D2[SbfReader]
E[Future: Rnxv2Obs]
end
subgraph "Support Layer"
G[gnss_specs<br/>Constellation definitions]
H[SignalIDMapper]
I[Metadata]
end
subgraph "Model Layer"
J[Pydantic Models<br/>Type-safe parsing]
end
A --> B
D -.implements.-> B
D2 -.implements.-> B
E -.implements.-> B
D --> BLD
D2 --> BLD
BLD --> B2
BLD --> H
BLD --> I
BLD --> C
D --> J
D --> G
D2 --> G
Layer Responsibilities¶
User Layer -- Instantiates readers, calls to_ds() or iter_epochs(), and operates on returned Datasets.
Interface Layer (BaseModel + ABC) -- Defines required methods, enforces contracts via Pydantic validation, provides SignalID for type-safe signal identifiers, and offers validate_dataset() for output validation.
Builder Layer -- DatasetBuilder handles coordinate assembly, frequency resolution, dtype enforcement, and validation. Readers delegate Dataset construction to the builder instead of assembling arrays manually.
Implementation Layer (Concrete Readers) -- Parses specific formats, implements abstract methods, and handles format-specific details. Rnxv3Obs reads RINEX v3.04 text; SbfReader reads Septentrio Binary Format with embedded satellite geometry.
Support Layer -- Provides constellation specifications (GPS, Galileo, etc.), Signal ID mapping, and metadata templates.
Model Layer (Pydantic) -- Supplies type-safe data models with automatic validation for parsing RINEX headers and epoch records.
Component Interactions¶
Parsing Flow (with DatasetBuilder)¶
sequenceDiagram
participant User
participant Reader as Rnxv3Obs<br/>(GNSSDataReader)
participant Header as Rnxv3Header<br/>(Pydantic)
participant Builder as DatasetBuilder
participant SigID as SignalID
participant Mapper as SignalIDMapper
participant Validator
User->>Reader: Rnxv3Obs(fpath=path)
activate Reader
Note right of Reader: fpath validated by<br/>BaseModel field_validator
Reader->>Header: Parse header section
Header-->>Reader: Validated header data
deactivate Reader
User->>Reader: to_ds()
activate Reader
Reader->>Builder: DatasetBuilder(reader)
loop For each epoch
Reader->>Builder: add_epoch(timestamp)
Builder-->>Reader: epoch_idx
loop For each observation
Reader->>Builder: add_signal(sv, band, code)
Builder->>SigID: SignalID(sv, band, code)
Note right of SigID: SV validated against<br/>SV_PATTERN
SigID-->>Builder: validated SignalID
Builder-->>Reader: SignalID
Reader->>Builder: set_value(ei, sig, "SNR", val)
end
end
Reader->>Builder: build()
Builder->>Mapper: resolve frequencies
Mapper-->>Builder: freq_center, freq_min, freq_max
Builder->>Validator: validate_dataset(ds)
Validator-->>Builder: passes
Builder-->>Reader: xr.Dataset
Reader-->>User: xr.Dataset
deactivate Reader
Key interactions in this flow:
- Pydantic
field_validatorchecks file existence at construction time. SignalIDvalidates each signal identifier (SV format, system letter).DatasetBuilderaccumulates epochs, signals, and values, then constructs the xarray structure.SignalIDMapperresolves band names to center frequencies and bandwidths.validate_dataset()ensures the output meets the structural contract.
Design Principles¶
-
Early Validation
Errors discovered during analysis are expensive to diagnose. Validation happens at parse time via Pydantic — invalid headers, wrong RINEX versions, and bad dtypes fail immediately with structured error messages.
-
Immutability
Readers like
Rnxv3Obsarefrozen=TruePydantic models. Once constructed,reader.fpathcannot be reassigned — predictable, thread-safe, cacheable.SbfReaderusesfrozen=Falsewith@cached_propertyfor lazy computation. -
Separation of Concerns
Format-specific parsing (RINEX text) is contained within the reader. Generic processing — Signal ID mapping, coordinate transforms, validation — lives in shared helpers used by all readers.
-
Mandatory Validation
Every Dataset must pass
validate_dataset()before being returned. The function is called at the end of everyto_ds()— impossible to accidentally skip it.
Early Validation with Pydantic¶
Errors discovered during analysis are expensive to diagnose and correct. Validation is therefore performed during parsing:
from pydantic import BaseModel, ConfigDict, field_validator
class Rnxv3Header(BaseModel):
"""RINEX v3 header with automatic validation (simplified)."""
model_config = ConfigDict(frozen=True, arbitrary_types_allowed=True)
fpath: Path
version: float
filetype: str
systems: str
obs_codes_per_system: dict[str, list[str]]
# ... 20+ additional fields parsed from header
@field_validator("version")
@classmethod
def check_version(cls, v):
if not (3.0 <= v < 4.0):
raise ValueError(f"Expected RINEX v3, got {v}")
return v
This approach catches errors at parse time with clear, structured error messages and provides type safety throughout the codebase.
Immutability¶
Once created, readers and their outputs are immutable:
class Rnxv3Obs(GNSSDataReader):
"""Immutable after initialization."""
model_config = ConfigDict(frozen=True)
# fpath inherited from GNSSDataReader — no need to redeclare
# Attempting to modify raises FrozenInstanceError
# reader.fpath = new_path # raises error
Immutability ensures predictable behavior, thread safety, and cacheable results.
Frozen is optional
The base class does not set frozen=True — subclasses choose.
Rnxv3Obs uses frozen=True (fully immutable), while SbfReader
uses frozen=False (allows @cached_property for lazy computation).
Separation of Format and Processing¶
Format-specific code is contained within the reader:
# In Rnxv3Obs — format-specific fast parsing
def _parse_obs_fast(slice_text: str) -> tuple[float | None, int | None, int | None]:
"""Inline RINEX v3 observation extraction (no Pydantic overhead)."""
...
def _create_dataset_single_pass(self) -> xr.Dataset:
"""Single-pass: header-derived SIDs → pre-allocated arrays → one file scan."""
...
Generic processing is handled by shared helpers:
# In gnss_specs — band property lookups shared across readers
mapper = SignalIDMapper()
freq = mapper.get_band_frequency("L1") # 1575.42
bw = mapper.get_band_bandwidth("L1") # 30.69
grp = mapper.get_overlapping_group("L1") # "group_1"
Explicit Configuration¶
Configuration is always explicit:
# Explicit parameter specifying which variables to retain
ds = reader.to_ds(keep_data_vars=["SNR", "Phase"])
Mandatory Validation¶
Every Dataset must be validated before it is returned:
def to_ds(self, **kwargs) -> xr.Dataset:
"""Convert to Dataset."""
ds = self._build_dataset(**kwargs)
# Validation is mandatory, not optional
validate_dataset(ds)
return ds
validate_dataset() Function¶
The validate_dataset() function ensures all readers produce compatible output.
It collects all violations and raises a single ValueError listing every problem.
from canvod.readers.base import validate_dataset
# Checks dimensions, coordinates (with dtypes), data variables, and attributes.
# Raises ValueError listing ALL violations at once.
validate_dataset(ds)
# Optionally specify required data variables (default: ["SNR"])
validate_dataset(ds, required_vars=["SNR", "Phase"])
The function checks:
- Dimensions:
(epoch, sid)must exist - Coordinates: all required coordinates with correct dtypes
- Data variables: required variables exist with
(epoch, sid)dimensions - Attributes: required global attributes (
Created,Software,Institution,File Hash)
Rationale for Structural Requirements¶
Dimensions (epoch, sid) -- Standardizes time series structure, enables efficient indexing and slicing, and maintains compatibility with xarray operations.
Coordinates -- freq_* coordinates are required for band overlap detection. system, band, and code enable constellation- and signal-level filtering. sv tracks individual satellites.
Attributes -- "File Hash" prevents duplicate ingestion in storage. Other metadata attributes support provenance tracking and reproducibility.
ReaderFactory Pattern¶
The canvodpy.ReaderFactory lives in the umbrella package and provides
name-based reader creation plus RINEX auto-detection. It is part of the
generic ComponentFactory family (alongside GridFactory, VODFactory,
AugmentationFactory).
from canvodpy import ReaderFactory
# Name-based creation (works for all registered readers)
reader = ReaderFactory.create("rinex3", fpath="station.25o")
reader = ReaderFactory.create("sbf", fpath="station.25_")
# Auto-detect RINEX v2/v3 from file header
reader = ReaderFactory.create_from_file("station.25o")
# Register a custom reader
ReaderFactory.register("my_format", MyFormatReader)
reader = ReaderFactory.create("my_format", fpath="data.myf")
Auto-detection scope
create_from_file() auto-detects RINEX v2/v3 from the first 9
characters of the file header. SBF and other binary formats should
use the name-based API: ReaderFactory.create("sbf", fpath=path).
Summary¶
The canvod-readers architecture is characterized by:
- Unified inheritance —
GNSSDataReader(BaseModel, ABC)provides file validation,fpath, andmodel_configout of the box. New readers only need one parent class. - Validated signal identifiers —
SignalIDcatches invalid SVs and malformed signal IDs at creation time, not during analysis. - Guided Dataset construction —
DatasetBuilderhandles coordinate arrays, frequency resolution, dtype enforcement, and validation automatically. - Contract enforcement through the ABC, ensuring consistent behavior across all readers.
- Type safety via Pydantic, catching errors during parsing.
- Structural validation through
validate_dataset(), ensuring downstream compatibility. - Extensibility — new formats can be added in ~30 lines without modifying existing code. See Building a Reader for a step-by-step guide.