canvod-audit¶
Audit, comparison, and regression verification for canvodpy GNSS-VOD pipelines.
canvod-audit is a workspace package that provides a structured, scientifically rigorous framework for verifying that canvodpy produces correct results. It exists because tests alone are not enough for a scientific processing pipeline: unit tests verify code logic, but audits verify that the physical outputs are right.
Why this package exists¶
canvodpy is a scientific data processing pipeline. A subtle coordinate transformation bug, a sign flip in an angle, or a quiet change in NaN handling can produce outputs that look fine but are scientifically wrong.
Traditional software testing catches many of these issues, but not all:
- Tests operate on tiny synthetic data. A test with 10 epochs and 3 satellites cannot reveal a memory-layout bug that only manifests at 86,400 epochs and 321 satellites.
- Tests don't cross implementation boundaries. A unit test for canvodpy's VOD retrieval cannot tell you whether it agrees with a completely independent implementation of the same algorithm.
- Tests don't survive refactors. When you rewrite the RINEX reader, how do you prove the new version produces the same outputs as the old one?
- Tests don't document tolerances. Even when two outputs should differ (SBF quantizes SNR to 0.25 dB; RINEX to ~0.001 dB), you need to state and justify the expected difference.
Four tiers + infrastructure¶
graph LR
subgraph "Tier 0: Self-Consistency"
D0["L1 store"] -->|same config| E0["audit_api_levels()"]
F0["L2/L3/L4"] -->|same config| E0
end
subgraph "Tier 1: Internal Consistency"
A["SBF reader"] -->|same data| C["audit_sbf_vs_rinex()"]
B["RINEX reader"] -->|same data| C
D["Broadcast ephemeris"] -->|same orbits| E["audit_ephemeris_sources()"]
F["Agency ephemeris"] -->|same orbits| E
end
subgraph "Tier 2: Regression"
G["Current code"] -->|today's output| H["audit_regression()"]
I["Frozen checkpoint"] -->|known-good output| H
end
subgraph "Tier 3: External"
J["canvodpy output"] --> K["audit_vs_gnssvod()"]
L["gnssvod output"] --> K
end
E0 --> M{{"PASS / FAIL"}}
C --> M
E --> M
H --> M
K --> M
| Tier | What it answers | When to run |
|---|---|---|
| Tier 0: Self-consistency | Do all API levels produce identical output? | After any algorithm or API change |
| Tier 1: Internal consistency | Do different code paths through canvodpy agree? | After changing readers, ephemeris providers, or coordinate transforms |
| Tier 2: Regression | Did a code change alter the outputs? | After any code change (CI) |
| Tier 3: External | Does canvodpy agree with independent implementations? | Before publication, after major algorithm changes |
| Infrastructure | Is the pipeline deterministic, serialization-safe, and chunk-invariant? | After store/pipeline changes |
Core concepts¶
Tolerance tiers¶
Three strictness levels formalise how close "close enough" is:
| Tier | Tolerances | Use case |
|---|---|---|
| EXACT | atol=0, rtol=0 |
Self-comparison, same code path |
| NUMERICAL | atol=1e-12, rtol=1e-10 |
Same algorithm, different float ordering |
| SCIENTIFIC | Per-variable, physically justified | Cross-implementation, different data sources |
SCIENTIFIC tolerances must include a description justifying the value:
Tolerance(
atol=0.25, rtol=0.0, nan_rate_atol=0.01,
description="SBF firmware quantises SNR to 0.25 dB steps",
)
NaN handling¶
GNSS data is full of NaNs (satellites below horizon, tracking loss). canvod-audit tracks NaN patterns explicitly:
nan_agreement_rate— fraction where both arrays agree on NaN statusnan_rate_atol— maximum allowed NaN rate difference- All statistics computed only over mutually non-NaN elements
Dataset alignment¶
compare_datasets() automatically finds the intersection of epoch and sid coordinates, aligns both datasets, and reports what was dropped in AlignmentInfo.
Runners¶
Tier 0: Self-consistency¶
from canvod.audit.runners import audit_api_levels
result = audit_api_levels(
stores={"l1": "/path/to/l1", "l2": "/path/to/l2"},
reference_level="l1",
store_type="rinex",
)
Tier 1: Internal consistency¶
Extending the audit is mandatory
Adding a new reader, ephemeris source, or processing path is not complete until a corresponding Tier 1 audit runner exists. Every new code path must be cross-validated against at least one existing path on shared test data.
from canvod.audit.runners import audit_sbf_vs_rinex, audit_ephemeris_sources
r1a = audit_sbf_vs_rinex(sbf_store="stores/sbf", rinex_store="stores/rinex")
r1b = audit_ephemeris_sources(broadcast_store="stores/broadcast", agency_store="stores/agency")
Tier 2: Regression¶
Freeze a known-good output as a checkpoint, then compare future outputs against it:
from canvod.audit.runners import freeze_checkpoint
freeze_checkpoint(
store="/path/to/store",
group="canopy_01",
output_dir="/path/to/checkpoints",
version="0.3.0",
metadata={"git_hash": "abc123", "date": "2026-03-10"},
)
from canvod.audit.runners import audit_regression
result = audit_regression(store="/path/to/store", checkpoint_dir="checkpoints/")
assert result.passed, result.summary()
Tier 3: External intercomparison¶
Compares canvodpy against an independent implementation (e.g. gnssvod by Humphrey et al.).
The SID vs PRN problem: canvodpy uses SIDs (G01|L1|C), gnssvod uses PRNs (G01). Solution: trim the RINEX file to one code per band before processing both tools.
from canvod.audit.rinex_trimmer import RinexTrimmer, gps_galileo_l1_l2
# Trim to one code per band (requires gfzrnx on PATH)
trimmer = RinexTrimmer(rinex_path, obs_types=gps_galileo_l1_l2())
trimmed_path = trimmer.write("/path/to/output/")
# Run comparison
from canvod.audit.runners import audit_vs_gnssvod
result = audit_vs_gnssvod(
canvodpy_store="/path/to/canvodpy_store",
gnssvod_file="/path/to/gnssvod_output.csv",
group="canopy_01",
)
Infrastructure checks¶
from canvod.audit.runners import (
audit_store_round_trip,
audit_temporal_chunking,
audit_idempotency,
audit_constellation_filter,
)
audit_store_round_trip(store="stores/canvodpy")
audit_temporal_chunking(monolithic_store="stores/7days", chunked_store="stores/chunked")
audit_idempotency(run1_store="stores/run1", run2_store="stores/run2")
audit_constellation_filter(all_constellations_store="stores/all", filtered_store="stores/gps", system_prefix="G")
Core API¶
compare_datasets()¶
from canvod.audit import compare_datasets, ToleranceTier, Tolerance
result = compare_datasets(
ds_a, ds_b,
variables=["SNR", "vod"],
tier=ToleranceTier.SCIENTIFIC,
tolerance_overrides={"SNR": Tolerance(atol=0.5, rtol=0.0)},
label="canvodpy vs gnssvod",
)
Returns ComparisonResult with:
| Attribute | Description |
|---|---|
.passed |
True if all variables within tolerance |
.failures |
dict[str, str] — variable → failure reason |
.variable_stats |
dict[str, VariableStats] — per-variable statistics |
.alignment |
AlignmentInfo — what was kept/dropped |
.summary() |
Human-readable summary |
.to_polars() |
Polars DataFrame of per-variable stats |
VariableStats¶
| Statistic | Description |
|---|---|
rmse |
Root mean square error |
bias |
Mean difference (a - b) |
mae |
Mean absolute error |
max_abs_diff |
Worst-case disagreement |
correlation |
Pearson correlation |
nan_agreement_rate |
NaN pattern consistency |
n_compared |
Effective sample size |
Reporting¶
from canvod.audit.reporting.tables import to_polars, to_markdown, to_latex
from canvod.audit.reporting.figures import plot_diff_histogram, plot_scatter, plot_summary_dashboard
# Tables
print(to_markdown(result))
print(to_latex(result, caption="SBF vs RINEX", label="tab:sbf-rinex"))
# Figures
fig = plot_summary_dashboard(result)
Extending the audit framework¶
Adding a new runner¶
-
Create a function in
canvod.audit.runnersfollowing the pattern:def audit_my_comparison(store_a: str, store_b: str, **kwargs) -> ComparisonResult: ds_a = load_from_store(store_a) ds_b = load_from_store(store_b) return compare_datasets(ds_a, ds_b, tier=ToleranceTier.SCIENTIFIC, ...) -
Define SCIENTIFIC tolerances with justified
descriptionfields - Add integration tests in
packages/canvod-audit/tests/ - Document the runner's purpose and expected results
Adding a new tolerance¶
When a new variable or comparison type needs custom tolerances:
from canvod.audit.tolerances import Tolerance
MY_TOLERANCES = {
"new_variable": Tolerance(
atol=0.01, rtol=1e-6, nan_rate_atol=0.05,
description="Justified by: [physical reason]",
),
}
CI integration¶
# tests/test_regression.py
import pytest
from canvod.audit.runners import audit_regression
@pytest.mark.integration
def test_regression(store_path, checkpoint_dir):
result = audit_regression(store=store_path, checkpoint_dir=checkpoint_dir)
assert result.passed, result.summary()
Design decisions¶
| Decision | Rationale |
|---|---|
| Separate workspace package | Audit code must not import from the package it audits (beyond xarray). Keeps core lean. |
| Frozen dataclasses for results | Immutable records, hashable, prevents accidental mutation |
| Polars for output tables | Consistent with canvodpy stack |
Tolerance.description field |
Every tolerance must be physically justified — extractable for methods sections |
| Four tiers | Maps directly to increasing levels of confidence: self → internal → regression → external |
Next in the trail: Architecture · API Levels · AI Development