canvod-readers¶
Purpose¶
The canvod-readers package provides validated parsers for GNSS observation data. It transforms raw receiver files into analysis-ready xarray Datasets, serving as the data ingestion layer for GNSS Transmissometry (GNSS-T) analysis.
-
RINEX v3.04 —
Rnxv3Obs
Text-based, all-GNSS standard format. Satellite geometry requires external SP3 + CLK precise ephemerides.
-
SBF Binary —
SbfReader
Septentrio binary telemetry. Satellite geometry, PVT quality, DOP, and receiver health are embedded — no ephemeris download needed.
Supported Formats at a Glance¶
| Feature | Rnxv3Obs |
SbfReader |
|---|---|---|
| Format | Plain text | Binary |
| Extension | .rnx |
.sbf |
| Satellite geometry (θ, φ) | SP3 download | Embedded |
| Extra metadata | Header only | PVT · DOP · quality |
to_ds() |
✓ | ✓ |
iter_epochs() |
✓ | ✓ |
to_metadata_ds() |
— | ✓ |
to_ds_and_auxiliary() |
{} aux |
{"sbf_obs": meta_ds} |
Drop-in replacement
Both readers produce identical (epoch × sid) xarray Datasets that pass
validate_dataset(). Downstream code is completely reader-agnostic.
Design¶
Data flow¶
graph TD
A1["RINEX v3 File (.rnx)"] --> B1["Rnxv3Obs (+ SP3/CLK)"]
A2["SBF File (.sbf)"] --> B2["SbfReader"]
B1 --> C["validate_dataset()"]
B2 --> C
C --> D["`**xarray.Dataset**
epoch x sid`"]
B2 --> E["`**Metadata Dataset**
DOP, PVT, theta, phi`"]
D --> F["Downstream Analysis"]
E --> F
Contract-Based Design¶
All readers implement the GNSSDataReader base class — a Pydantic BaseModel + ABC that provides file path validation, model configuration, and a consistent interface:
from pydantic import BaseModel, ConfigDict, field_validator
from abc import ABC, abstractmethod
import xarray as xr
class GNSSDataReader(BaseModel, ABC):
"""Base class for all GNSS data format readers."""
model_config = ConfigDict(arbitrary_types_allowed=True)
fpath: Path # Validated at construction time
@abstractmethod
def to_ds(self, **kwargs) -> xr.Dataset:
"""Convert to xarray.Dataset (epoch × sid)."""
@abstractmethod
def iter_epochs(self):
"""Iterate through epochs."""
@property
@abstractmethod
def file_hash(self) -> str:
"""SHA-256 hash for deduplication."""
def to_ds_and_auxiliary(
self, **kwargs
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
"""Single-pass scan: obs dataset + any auxiliary datasets.
Default returns empty aux dict.
SbfReader overrides for one-pass binary decode.
"""
return self.to_ds(**kwargs), {}
Subclasses only need to inherit from GNSSDataReader — no separate BaseModel import, no fpath field, no file validation boilerplate.
Usage Examples¶
from canvod.readers import Rnxv3Obs
reader = Rnxv3Obs(fpath="station.25o")
ds = reader.to_ds(keep_data_vars=["SNR"])
# Filter L-band signals
l_band = ds.where(ds.band.isin(["L1", "L2", "L5"]), drop=True)
from canvod.readers.sbf import SbfReader
reader = SbfReader(fpath="rref001a00.25_")
obs_ds, aux = reader.to_ds_and_auxiliary(keep_data_vars=["SNR"])
meta_ds = aux["sbf_obs"]
# Polar angle filter: elevation ≥ 20°
snr_filtered = obs_ds["SNR"].where(meta_ds["theta"] <= 70)
ds = reader.to_ds()
for system in ["G", "R", "E", "C"]:
sys_ds = ds.where(ds.system == system, drop=True)
mean_snr = sys_ds.SNR.mean(dim=["epoch", "sid"])
print(f"{system}: {mean_snr:.2f} dB")
from canvodpy import ReaderFactory
# By name (works for all registered readers)
reader = ReaderFactory.create("rinex3", fpath="station.25o")
# Auto-detect RINEX v2/v3 from file header
reader = ReaderFactory.create_from_file("station.25o")
# Both produce identical (epoch × sid) datasets
ds = reader.to_ds(keep_data_vars=["SNR"])
import xarray as xr
from pathlib import Path
datasets = [
Rnxv3Obs(fpath=f).to_ds(keep_data_vars=["SNR"])
for f in sorted(Path("/data/").glob("*.rnx"))
]
time_series = xr.concat(datasets, dim="epoch")
Key Components¶
-
SignalID— Validated Signal Identifiers
Pydantic model for signal identifiers (
SV|band|code). Validates the SV against known GNSS systems at creation time. Frozen, hashable, and used throughout the builder and readers.from canvod.readers import SignalID sig = SignalID(sv="G01", band="L1", code="C") sig.sid # → "G01|L1|C" sig.system # → "G" -
DatasetBuilder— Guided Dataset Construction
Handles coordinate assembly, frequency resolution, dtype enforcement, and validation. Readers use
add_epoch()→add_signal()→set_value()→build()instead of manual numpy/xarray assembly.from canvod.readers.builder import DatasetBuilder builder = DatasetBuilder(reader) ei = builder.add_epoch(timestamp) sig = builder.add_signal(sv="G01", band="L1", code="C") builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset -
GNSS Specifications
gnss_specsprovides constellation definitions for GPS, GALILEO, GLONASS, BeiDou, QZSS, and SBAS including band mappings and centre frequencies.from canvod.readers.gnss_specs.constellations import GPS gps = GPS() # static SVs from IGS SINEX catalog gps.BANDS # {'1': 'L1', '2': 'L2', '5': 'L5'} -
Signal ID Mapper
SignalIDMapperprovides frequency, bandwidth, and overlap-group lookups for canonicalSV|Band|Codesignal IDs. SIDs are constructed directly from header obs codes in the fast-path reader.mapper = SignalIDMapper() freq = mapper.get_band_frequency("L1") # → 1575.42 bw = mapper.get_band_bandwidth("L1") # → 30.69 -
validate_dataset()
Every dataset produced by any reader must pass structural validation before it is returned. Checks dimensions, coordinate dtypes, required variables, and global attributes.
from canvod.readers.base import validate_dataset validate_dataset(ds) # raises ValueError listing ALL violations
Performance¶
Single-Pass Parser¶
Rnxv3Obs uses a single-pass parser (_create_dataset_single_pass) that pre-computes the full Signal ID (SID) space from the RINEX header and fills pre-allocated NumPy arrays in one pass over the file. This avoids the overhead of:
- Per-observation object allocation — inline string parsing (
_parse_obs_fast) replaces Pydantic model instantiation - Repeated signal ID lookups — a pre-built lookup table maps
(SV, obs_code)→ array index directly - Redundant header re-parsing — SIDs are derived once from header metadata via
_precompute_sids_from_header()
Tips¶
Memory
Use keep_data_vars=["SNR"] to load only what you need.
Full RINEX with phase + Doppler uses ~4× more memory.
Batch processing
For many files, the orchestrator uses Dask Distributed with a
LocalCluster for parallel processing. Each worker handles one file
at a time. Falls back to ProcessPoolExecutor if Dask is unavailable.
See the Dask & Resource Management
guide for configuration and monitoring.
Storage
After processing, write to Icechunk via canvod-store for
compressed, versioned storage with O(1) epoch lookups.