Extending Readers¶
Add support for a new GNSS data format by implementing the GNSSDataReader abstract base class. canvod-readers uses the ABC pattern to enforce a consistent contract — any reader that passes the checklist below can be used anywhere GNSSDataReader is accepted.
Implementation Checklist¶
-
1. Inherit correctly
class MyReader(GNSSDataReader)— just one parent!GNSSDataReaderis already a PydanticBaseModelwithfpathand file validation built in. -
2. Implement abstract methods
file_hash,to_ds(),iter_epochs(),start_time,end_time,systems,num_satellites. (num_epochshas a default that counts viaiter_epochs()— override for O(1) if your format stores the count.) -
3. Use
DatasetBuilder(recommended)
Use
DatasetBuilderin yourto_ds()to construct the output Dataset. It handles coordinate arrays, frequency resolution, dtype enforcement, and callsvalidate_dataset()automatically. -
4. Write tests
Test structure, file hash, error paths, and the validation round-trip. Aim for >90 % coverage.
Contract Constants¶
The output Dataset contract is defined by importable constants in
canvod.readers.base — these are the single source of truth:
from canvod.readers.base import (
REQUIRED_DIMS, # ("epoch", "sid")
REQUIRED_COORDS, # {name: dtype, ...}
REQUIRED_ATTRS, # {"Created", "Software", "Institution", "File Hash"}
DEFAULT_REQUIRED_VARS, # ["SNR"]
)
Use validate_dataset(ds) to check any Dataset against them.
It collects all violations and raises a single ValueError
listing every problem.
Step-by-Step Implementation¶
Step 1 — Reader Class¶
GNSSDataReader is a Pydantic BaseModel with fpath: Path and file
validation built in. You only need to set reader-specific config:
from pydantic import ConfigDict
from canvod.readers.base import GNSSDataReader
class MyFormatReader(GNSSDataReader):
"""Reader for My Custom Format."""
model_config = ConfigDict(frozen=True) # no arbitrary_types needed
# no fpath field needed — inherited from GNSSDataReader
Step 2 — File Hash¶
from canvod.readers.gnss_specs.utils import file_hash
class MyFormatReader(GNSSDataReader):
...
@property
def file_hash(self) -> str:
"""16-character SHA-256 prefix of the file — used for deduplication."""
return file_hash(self.fpath)
Step 3 — Metadata Properties¶
class MyFormatReader(GNSSDataReader):
...
@property
def start_time(self) -> datetime:
return self._parse_start_time()
@property
def end_time(self) -> datetime:
return self._parse_end_time()
@property
def systems(self) -> list[str]:
return self._parse_systems() # e.g. ["G", "E"]
# num_epochs has a default (iterates via iter_epochs);
# override for O(1) if your format stores the count in the header.
@property
def num_satellites(self) -> int:
return self._count_satellites()
Step 4 — Epoch Iterator¶
from collections.abc import Iterator
class MyFormatReader(GNSSDataReader):
...
def iter_epochs(self) -> Iterator:
"""Lazily yield one epoch at a time — keep memory bounded."""
with self.fpath.open("rb") as f:
self._skip_header(f)
for raw in self._raw_epoch_generator(f):
yield self._decode_epoch(raw)
Step 5 — Dataset Conversion with DatasetBuilder¶
DatasetBuilder handles coordinate assembly, frequency resolution,
dtype enforcement, and validation — so your to_ds() stays simple:
from canvod.readers.builder import DatasetBuilder
class MyFormatReader(GNSSDataReader):
...
def to_ds(
self,
keep_data_vars: list[str] | None = None,
**kwargs,
) -> xr.Dataset:
builder = DatasetBuilder(self)
for epoch in self.iter_epochs():
ei = builder.add_epoch(epoch.timestamp)
for obs in epoch.observations:
sig = builder.add_signal(
sv=obs.sv, band=obs.band, code=obs.code
)
builder.set_value(ei, sig, "SNR", obs.snr_value)
return builder.build(
keep_data_vars=keep_data_vars,
extra_attrs={"Source Format": "My Custom Format"},
)
Manual Dataset construction (advanced)
If you need more control than DatasetBuilder provides, you can
construct the Dataset manually using SignalIDMapper and
validate_dataset():
import numpy as np
import xarray as xr
from canvod.readers.gnss_specs.signals import SignalIDMapper
from canvod.readers.gnss_specs.metadata import SNR_METADATA, COORDS_METADATA
from canvod.readers.base import validate_dataset
class MyFormatReader(GNSSDataReader):
...
def to_ds(
self,
keep_data_vars: list[str] | None = None,
**kwargs,
) -> xr.Dataset:
all_epochs = list(self.iter_epochs())
mapper = SignalIDMapper()
# Build SID index, coordinate arrays, data arrays...
# (see existing readers for full example)
ds = xr.Dataset(
data_vars={"SNR": (("epoch", "sid"), snr, SNR_METADATA)},
coords={...},
attrs={**self._build_attrs(), "Source Format": "My Custom Format"},
)
# MANDATORY — validate before returning
validate_dataset(ds, required_vars=keep_data_vars)
return ds
Step 6 — to_ds_and_auxiliary() (optional)¶
If your format embeds metadata beyond observations (like SBF embeds
satellite geometry), override to_ds_and_auxiliary() to collect
both datasets in a single file scan:
def to_ds_and_auxiliary(
self,
keep_data_vars: list[str] | None = None,
**kwargs,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
obs_ds = ... # build obs dataset
meta_ds = ... # build metadata dataset
return obs_ds, {"my_format_meta": meta_ds}
The default implementation calls to_ds() and returns an empty dict.
Validation Requirements¶
assert "epoch" in ds.dims
assert "sid" in ds.dims
from canvod.readers.base import REQUIRED_COORDS
# REQUIRED_COORDS = {
# "epoch": "datetime64[ns]",
# "sid": "object", # string
# "sv": "object",
# "system": "object",
# "band": "object",
# "code": "object",
# "freq_center": "float32", # must be float32
# "freq_min": "float32",
# "freq_max": "float32",
# }
from canvod.readers.base import REQUIRED_ATTRS
# REQUIRED_ATTRS = {
# "Created",
# "Software",
# "Institution",
# "File Hash", # for storage deduplication
# }
# SNR required by default
assert "SNR" in ds.data_vars
# All variables must be (epoch, sid)
for var in ds.data_vars:
assert ds[var].dims == ("epoch", "sid")
Testing¶
# tests/test_my_format_reader.py
import pytest
from pathlib import Path
from my_package.readers import MyFormatReader
class TestMyFormatReader:
def test_file_hash_is_deterministic(self, tmp_path):
f = tmp_path / "test.dat"
f.write_bytes(b"content")
reader = MyFormatReader(fpath=f)
assert reader.file_hash == reader.file_hash
assert len(reader.file_hash) == 16
def test_dataset_dimensions(self, real_test_file):
ds = MyFormatReader(fpath=real_test_file).to_ds()
assert "epoch" in ds.dims
assert "sid" in ds.dims
def test_dataset_variables(self, real_test_file):
ds = MyFormatReader(fpath=real_test_file).to_ds()
assert "SNR" in ds.data_vars
def test_sid_dimensions(self, real_test_file):
ds = MyFormatReader(fpath=real_test_file).to_ds()
for var in ds.data_vars:
assert ds[var].dims == ("epoch", "sid")
def test_file_hash_in_attrs(self, real_test_file):
ds = MyFormatReader(fpath=real_test_file).to_ds()
assert "File Hash" in ds.attrs
@pytest.mark.integration
def test_full_pipeline(real_test_file):
reader = MyFormatReader(fpath=real_test_file)
ds = reader.to_ds(keep_data_vars=["SNR"])
# Filter GPS only
gps = ds.where(ds.system == "G", drop=True)
assert len(gps.sid) > 0
# Sanity-check values
assert float(gps.SNR.mean()) > 0
from canvod.readers.base import validate_dataset
def test_validation_passes(real_test_file):
ds = MyFormatReader(fpath=real_test_file).to_ds()
# validate_dataset() is already called inside to_ds() —
# this test verifies it didn't raise
validate_dataset(ds) # should not raise
Audit Integration¶
Adding a new reader is not complete until the audit suite can validate its output against existing readers. This ensures intra-validation — that your new reader produces scientifically consistent results when processing the same observation data.
-
Add a Tier 1 comparison — compare your reader's output against an existing reader on shared test data (same receiver, same time window). Follow the pattern in
canvod.audit.runners.sbf_vs_rinex:from canvod.audit.runners import audit_sbf_vs_rinex # Your equivalent: audit_myformat_vs_rinex(...) result = audit_sbf_vs_rinex(sbf_store="...", rinex_store="...") assert result.passed -
Define tolerances — SNR should be bit-identical if the underlying data is the same. Angular values (θ, φ) may differ if ephemeris sources differ. Use the appropriate
ToleranceTier(EXACT, NUMERICAL, or SCIENTIFIC) and document expected differences. -
Add integration tests in
packages/canvod-audit/tests/test_integration.py— verify dataset structure, shared observables, and value ranges against real data from the test submodule. -
Run the full audit after implementation:
just test-audit
See the Audit Suite for the full tier system.
Common Pitfalls¶
Wrong dtype for frequency coordinates
# WRONG — float64 fails the dtype check
freq_center = np.array([...], dtype=np.float64)
# CORRECT — DatasetBuilder handles this automatically
freq_center = np.array([...], dtype=np.float32)
Skipping validation
# WRONG — missing mandatory validation
def to_ds(self, **kwargs) -> xr.Dataset:
ds = self._build_dataset()
return ds # ← will silently produce invalid datasets downstream
# CORRECT — DatasetBuilder.build() calls validate_dataset() for you
def to_ds(self, **kwargs) -> xr.Dataset:
builder = DatasetBuilder(self)
# ... add epochs, signals, values ...
return builder.build() # validates automatically
Wrong dimension names
# WRONG
data_vars={"SNR": (("time", "signal"), data)}
# CORRECT — DatasetBuilder uses the right names automatically
data_vars={"SNR": (("epoch", "sid"), data)}
Registering with ReaderFactory¶
from canvodpy import ReaderFactory
from my_package.readers import MyFormatReader
# Register by name
ReaderFactory.register("my_format", MyFormatReader)
# Create by name
reader = ReaderFactory.create("my_format", fpath="file.dat")
For RINEX files, ReaderFactory.create_from_file(path) auto-detects
v2/v3 from the file header. Custom binary formats should use the
name-based API above.