canvod-utils¶
Purpose¶
The canvod-utils package provides configuration management and command-line tooling for the canVODpy ecosystem. It implements a YAML-based configuration system with Pydantic validation, shared date utilities, and CLI shortcuts.
Configuration System¶
Three YAML files in config/ control all aspects of a canVODpy deployment:
-
processing.yaml
Author metadata, NASA CDDIS credentials, auxiliary data agency, parallel processing limits, Icechunk compression, store strategies.
-
sites.yaml
Research site definitions — data root paths, receiver types, directory layout, SCS expansion (
scs_from), VOD analysis pairs. -
sids.yaml
Signal ID (SID) filtering —
all, a namedpreset(e.g.gps_galileo), or acustomlist of SIDs to retain.
User configuration overrides package defaults for any specified values. Unset keys fall back to bundled defaults.
Configuration Files¶
metadata:
author: Your Name
email: your.email@example.com
institution: Your Institution
credentials:
nasa_earthdata_acc_mail: null # NASA CDDIS auth (optional)
aux_data:
agency: COD
product_type: final
processing:
keep_rnx_vars: [SNR]
aggregate_glonass_fdma: true
store_radial_distance: false
receiver_position_mode: shared # or per_receiver
file_pairing: complete # or paired
batch_hours: 24
resource_mode: auto
# n_max_threads: 8 # manual mode only
# max_memory_gb: 16 # manual mode only
# threads_per_worker: 1
# cpu_affinity: [0, 1, 2, 3]
# nice_priority: 10
preprocessing:
temporal_aggregation:
enabled: true
freq: "1min"
method: mean
grid_assignment:
enabled: true
grid_type: equal_area
angular_resolution: 2.0
compression:
zlib: true
complevel: 5
icechunk:
compression_level: 5
compression_algorithm: zstd
inline_threshold: 512
get_concurrency: 1
# manifest_preload_enabled: false
# manifest_preload_max_refs: 100000000
# manifest_preload_pattern: "epoch|sid"
chunk_strategies:
rinex_store:
epoch: 34560
sid: -1
vod_store:
epoch: 34560
sid: -1
storage:
stores_root_dir: /path/to/your/gnss/stores
rinex_store_strategy: skip
vod_store_strategy: overwrite
sites:
rosalia:
base_dir: /path/to/rosalia
receivers:
reference_01:
type: reference
directory: 01_reference/01_GNSS/01_raw
recipe: rosalia_reference.yaml
reader_format: rinex3
canopy_01:
type: canopy
directory: 02_canopy/01_GNSS/01_raw
recipe: rosalia_canopy.yaml
reader_format: auto
scs_from: null # use own position
canopy_02:
type: canopy
directory: 02_canopy/02_GNSS/01_raw
vod_analyses:
canopy_01_vs_reference_01:
canopy_receiver: canopy_01
reference_receiver: reference_01
# Keep all signals
mode: all
# --- or named preset ---
# mode: preset
# preset: gps_galileo
# --- or explicit list ---
# mode: custom
# custom:
# - "G01|L1|C"
# - "E01|E1|C"
Loading Configuration¶
from canvod.utils.config import load_config
config = load_config()
# Access any section
author = config.processing.metadata.author
agency = config.processing.aux_data.agency
n_cores = config.processing.processing.n_max_threads
Validation at load time
All values are validated by Pydantic models. Invalid emails, non-existent paths, and out-of-range parameters produce structured error messages immediately — not at runtime hours into a long processing run.
CLI Quick Reference¶
just config-init # Copy .example templates + recipe files → config/
just config-edit processing # Open processing.yaml in $EDITOR
just config-edit sites # Open sites.yaml in $EDITOR
just config-validate # Validate all configuration files
just config-show # Display resolved configuration
just test # Run full test suite
just check # Lint + format + type-check
just hooks # Install pre-commit hooks
just docs # Serve documentation locally
just test-coverage # Tests with coverage report
just clean # Remove build artifacts
just process # Run full pipeline
just process-date YYYYDOY # Process single day
just process-range START END # Process date range