canvod.readers API Reference¶

RINEX observation file parsing with validation and GNSS signal specifications.

Package¶

GNSS data format readers.

This package provides readers for various GNSS data formats, all implementing a common interface for seamless integration with processing pipelines.

Supported formats: - RINEX v3.04 (GNSS observations) - More formats coming soon...

Quick Start¶

from canvod.readers import Rnxv3Obs

# Read RINEX v3 file
reader = Rnxv3Obs(fpath="station.24o")
dataset = reader.to_ds()

Or use the canvodpy factory for automatic format detection:

from canvodpy import ReaderFactory

# Auto-detects format from file header
reader = ReaderFactory.create_from_file("station.24o")
dataset = reader.to_ds()

Directory Matching:

from canvod.readers import DataDirMatcher

# Find dates with RINEX files in both receivers
matcher = DataDirMatcher(root=Path("/data/01_Rosalia"))
for matched_dirs in matcher:
    print(matched_dirs.yyyydoy)
    # Load RINEX files from matched_dirs.canopy_data_dir

`GNSSDataReader` ¶

Bases: BaseModel, ABC

Abstract base class for all GNSS data format readers.

All readers must: 1. Inherit from this class 2. Implement all abstract methods 3. Return xarray.Dataset that passes :func:validate_dataset 4. Provide file hash for deduplication

This ensures compatibility with: - canvod-vod: VOD calculation - canvod-store: MyIcechunkStore storage - canvod-grids: Grid projection operations

Subclasses may override model_config to set frozen, extra, etc. The base class provides arbitrary_types_allowed=True which is needed by readers that use pint.Quantity or similar third-party types.

Examples¶

class Rnxv3Obs(GNSSDataReader): ... def to_ds(self, **kwargs) -> xr.Dataset: ... # Implementation ... return dataset ... reader = Rnxv3Obs(fpath="station.24o") ds = reader.to_ds() validate_dataset(ds)

Source code in packages/canvod-readers/src/canvod/readers/base.py

class GNSSDataReader(BaseModel, ABC):
    """Abstract base class for all GNSS data format readers.

    All readers must:
    1. Inherit from this class
    2. Implement all abstract methods
    3. Return xarray.Dataset that passes :func:`validate_dataset`
    4. Provide file hash for deduplication

    This ensures compatibility with:
    - canvod-vod: VOD calculation
    - canvod-store: MyIcechunkStore storage
    - canvod-grids: Grid projection operations

    Subclasses may override ``model_config`` to set ``frozen``, ``extra``,
    etc.  The base class provides ``arbitrary_types_allowed=True`` which is
    needed by readers that use ``pint.Quantity`` or similar third-party types.

    Examples
    --------
    >>> class Rnxv3Obs(GNSSDataReader):
    ...     def to_ds(self, **kwargs) -> xr.Dataset:
    ...         # Implementation
    ...         return dataset
    ...
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> ds = reader.to_ds()
    >>> validate_dataset(ds)
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    fpath: Path

    @field_validator("fpath")
    @classmethod
    def _validate_fpath(cls, v: Path) -> Path:
        """Validate that the file path points to an existing file."""
        v = Path(v)
        if not v.is_file():
            raise FileNotFoundError(f"File not found: {v}")
        return v

    @property
    def source_format(self) -> str:
        """Return the format identifier for this reader (e.g. ``"rinex3"``, ``"sbf"``)."""
        return "rinex3"

    @property
    @abstractmethod
    def file_hash(self) -> str:
        """Return SHA256 hash of file for deduplication.

        Used by MyIcechunkStore to avoid duplicate ingestion.
        Must be deterministic and reproducible.

        Returns
        -------
        str
            Short hash (16 chars) or full hash of file content
        """

    @abstractmethod
    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert data to xarray.Dataset.

        Must return Dataset with structure:
        - Dims: (epoch, sid)
        - Coords: epoch, sid, sv, system, band, code, freq_*
        - Data vars: At minimum SNR
        - Attrs: Must include "File Hash"

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to include. If None, includes all available.
        **kwargs
            Implementation-specific parameters

        Returns
        -------
        xr.Dataset
            Dataset that passes :func:`validate_dataset`.
        """

    @abstractmethod
    def iter_epochs(self) -> Iterator[object]:
        """Iterate over epochs in the file.

        Yields
        ------
        Epoch
            Parsed epoch with satellites and observations.
        """

    def to_ds_and_auxiliary(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
        """Produce the obs dataset and any auxiliary datasets in a single call.

        Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
        Readers that produce metadata (e.g. SBF) override this to collect both
        in a single file scan.

        Returns
        -------
        tuple[xr.Dataset, dict[str, xr.Dataset]]
            ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
            readers with no extra data (RINEX v2/v3).
        """
        return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

    def _build_attrs(self) -> dict[str, str]:
        """Build standard global attributes for the output Dataset.

        Reads institution/author from config, adds timestamp, version,
        and the file hash.

        Returns
        -------
        dict[str, str]
            Ready-to-use attrs dict.
        """
        from canvod.readers.gnss_specs.metadata import get_global_attrs
        from canvod.readers.gnss_specs.utils import get_version_from_pyproject

        attrs = get_global_attrs()
        attrs["Created"] = datetime.now(UTC).isoformat()
        attrs["Software"] = (
            f"{attrs['Software']}, Version: {get_version_from_pyproject()}"
        )
        attrs["File Hash"] = self.file_hash
        return attrs

    @property
    @abstractmethod
    def start_time(self) -> datetime:
        """Return start time of observations.

        Returns
        -------
        datetime
            First observation timestamp in the file.
        """

    @property
    @abstractmethod
    def end_time(self) -> datetime:
        """Return end time of observations.

        Returns
        -------
        datetime
            Last observation timestamp in the file.
        """

    @property
    @abstractmethod
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'
        """

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Default implementation iterates epochs.  Subclasses may override
        with a faster approach.

        Returns
        -------
        int
            Total number of observation epochs.
        """
        return sum(1 for _ in self.iter_epochs())

    @property
    @abstractmethod
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.
        """

    def __repr__(self) -> str:
        """Return the string representation."""
        return f"{self.__class__.__name__}(file='{self.fpath.name}')"

`source_format` `property` ¶

Return the format identifier for this reader (e.g. "rinex3", "sbf").

`file_hash` `abstractmethod` `property` ¶

Return SHA256 hash of file for deduplication.

Used by MyIcechunkStore to avoid duplicate ingestion. Must be deterministic and reproducible.

Returns¶

str Short hash (16 chars) or full hash of file content

`start_time` `abstractmethod` `property` ¶

Return start time of observations.

Returns¶

datetime First observation timestamp in the file.

`end_time` `abstractmethod` `property` ¶

Return end time of observations.

Returns¶

datetime Last observation timestamp in the file.

`systems` `abstractmethod` `property` ¶

Return list of GNSS systems in file.

Returns¶

list of str System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'

`num_epochs` `property` ¶

Return number of epochs in file.

Default implementation iterates epochs. Subclasses may override with a faster approach.

Returns¶

int Total number of observation epochs.

`num_satellites` `abstractmethod` `property` ¶

Return total number of unique satellites observed.

Returns¶

int Count of unique satellite vehicles across all systems.

`to_ds(keep_data_vars=None, **kwargs)` `abstractmethod` ¶

Convert data to xarray.Dataset.

Must return Dataset with structure: - Dims: (epoch, sid) - Coords: epoch, sid, sv, system, band, code, freq_* - Data vars: At minimum SNR - Attrs: Must include "File Hash"

Parameters¶

keep_data_vars : list of str, optional Data variables to include. If None, includes all available. **kwargs Implementation-specific parameters

Returns¶

xr.Dataset Dataset that passes :func:validate_dataset.

Source code in packages/canvod-readers/src/canvod/readers/base.py

@abstractmethod
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert data to xarray.Dataset.

    Must return Dataset with structure:
    - Dims: (epoch, sid)
    - Coords: epoch, sid, sv, system, band, code, freq_*
    - Data vars: At minimum SNR
    - Attrs: Must include "File Hash"

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to include. If None, includes all available.
    **kwargs
        Implementation-specific parameters

    Returns
    -------
    xr.Dataset
        Dataset that passes :func:`validate_dataset`.
    """

`iter_epochs()` `abstractmethod` ¶

Iterate over epochs in the file.

Yields¶

Epoch Parsed epoch with satellites and observations.

Source code in packages/canvod-readers/src/canvod/readers/base.py

@abstractmethod
def iter_epochs(self) -> Iterator[object]:
    """Iterate over epochs in the file.

    Yields
    ------
    Epoch
        Parsed epoch with satellites and observations.
    """

`to_ds_and_auxiliary(keep_data_vars=None, **kwargs)` ¶

Produce the obs dataset and any auxiliary datasets in a single call.

Default: calls to_ds(**kwargs) and returns an empty auxiliary dict. Readers that produce metadata (e.g. SBF) override this to collect both in a single file scan.

Returns¶

tuple[xr.Dataset, dict[str, xr.Dataset]] (obs_ds, {"name": aux_ds, ...}). Auxiliary dict is empty for readers with no extra data (RINEX v2/v3).

Source code in packages/canvod-readers/src/canvod/readers/base.py

def to_ds_and_auxiliary(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
    """Produce the obs dataset and any auxiliary datasets in a single call.

    Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
    Readers that produce metadata (e.g. SBF) override this to collect both
    in a single file scan.

    Returns
    -------
    tuple[xr.Dataset, dict[str, xr.Dataset]]
        ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
        readers with no extra data (RINEX v2/v3).
    """
    return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

`repr()` ¶

Return the string representation.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def __repr__(self) -> str:
    """Return the string representation."""
    return f"{self.__class__.__name__}(file='{self.fpath.name}')"

`SignalID` ¶

Bases: BaseModel

Validated signal identifier (SV + band + code).

sid = SignalID(sv="G01", band="L1", code="C") str(sid) 'G01|L1|C' sid.system 'G'

Source code in packages/canvod-readers/src/canvod/readers/base.py

class SignalID(BaseModel):
    """Validated signal identifier (SV + band + code).

    >>> sid = SignalID(sv="G01", band="L1", code="C")
    >>> str(sid)
    'G01|L1|C'
    >>> sid.system
    'G'
    """

    model_config = ConfigDict(frozen=True)

    sv: str
    band: str
    code: str

    @field_validator("sv")
    @classmethod
    def _validate_sv(cls, v: str) -> str:
        if not SV_PATTERN.match(v):
            raise ValueError(
                f"Invalid SV: {v!r} — expected system letter + 2-digit PRN "
                f"(e.g. 'G01'). Valid systems: G, R, E, C, J, S, I"
            )
        return v

    @property
    def system(self) -> str:
        """GNSS system letter (e.g. 'G' for GPS)."""
        return self.sv[0]

    @property
    def sid(self) -> str:
        """Full signal ID string ('SV|band|code')."""
        return f"{self.sv}|{self.band}|{self.code}"

    def __str__(self) -> str:
        return self.sid

    def __hash__(self) -> int:
        return hash(self.sid)

    def __eq__(self, other: object) -> bool:
        if isinstance(other, SignalID):
            return self.sid == other.sid
        return NotImplemented

    @classmethod
    def from_string(cls, sid_str: str) -> SignalID:
        """Parse a signal ID string ('SV|band|code') into a SignalID.

        Parameters
        ----------
        sid_str : str
            Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

        Returns
        -------
        SignalID
            Validated signal identifier.

        Raises
        ------
        ValueError
            If the string does not have exactly three pipe-separated parts.
        """
        parts = sid_str.split("|")
        if len(parts) != 3:
            raise ValueError(
                f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
            )
        return cls(sv=parts[0], band=parts[1], code=parts[2])

`system` `property` ¶

GNSS system letter (e.g. 'G' for GPS).

`sid` `property` ¶

Full signal ID string ('SV|band|code').

`from_string(sid_str)` `classmethod` ¶

Parse a signal ID string ('SV|band|code') into a SignalID.

Parameters¶

sid_str : str Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

Returns¶

SignalID Validated signal identifier.

Raises¶

ValueError If the string does not have exactly three pipe-separated parts.

Source code in packages/canvod-readers/src/canvod/readers/base.py

@classmethod
def from_string(cls, sid_str: str) -> SignalID:
    """Parse a signal ID string ('SV|band|code') into a SignalID.

    Parameters
    ----------
    sid_str : str
        Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

    Returns
    -------
    SignalID
        Validated signal identifier.

    Raises
    ------
    ValueError
        If the string does not have exactly three pipe-separated parts.
    """
    parts = sid_str.split("|")
    if len(parts) != 3:
        raise ValueError(
            f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
        )
    return cls(sv=parts[0], band=parts[1], code=parts[2])

`DatasetBuilder` ¶

Guided builder for constructing valid GNSSDataReader output Datasets.

Handles coordinate arrays, dtype enforcement, frequency resolution, and contract validation automatically.

Parameters¶

reader : GNSSDataReader The reader instance (used for _build_attrs() and file hash). aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels (default True).

Examples¶

builder = DatasetBuilder(reader) for epoch in reader.iter_epochs(): ... ei = builder.add_epoch(epoch.timestamp) ... for obs in epoch.observations: ... sig = builder.add_signal(sv="G01", band="L1", code="C") ... builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset

Source code in packages/canvod-readers/src/canvod/readers/builder.py

class DatasetBuilder:
    """Guided builder for constructing valid GNSSDataReader output Datasets.

    Handles coordinate arrays, dtype enforcement, frequency resolution,
    and contract validation automatically.

    Parameters
    ----------
    reader : GNSSDataReader
        The reader instance (used for ``_build_attrs()`` and file hash).
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels (default True).

    Examples
    --------
    >>> builder = DatasetBuilder(reader)
    >>> for epoch in reader.iter_epochs():
    ...     ei = builder.add_epoch(epoch.timestamp)
    ...     for obs in epoch.observations:
    ...         sig = builder.add_signal(sv="G01", band="L1", code="C")
    ...         builder.set_value(ei, sig, "SNR", 42.0)
    >>> ds = builder.build()   # validated Dataset
    """

    def __init__(
        self,
        reader: GNSSDataReader,
        *,
        aggregate_glonass_fdma: bool = True,
    ) -> None:
        self._reader = reader
        self._mapper = SignalIDMapper(aggregate_glonass_fdma=aggregate_glonass_fdma)
        self._signals: dict[str, SignalID] = {}
        self._epochs: list[datetime] = []
        self._values: dict[str, dict[tuple[int, str], float]] = {}

    def add_epoch(self, timestamp: datetime) -> int:
        """Register an epoch timestamp. Returns epoch index."""
        self._epochs.append(timestamp)
        return len(self._epochs) - 1

    def add_signal(self, sv: str, band: str, code: str) -> SignalID:
        """Register a signal (idempotent). Returns validated SignalID."""
        sig = SignalID(sv=sv, band=band, code=code)
        self._signals[sig.sid] = sig
        return sig

    def set_value(
        self,
        epoch_idx: int,
        signal: SignalID | str,
        var: str,
        value: float,
    ) -> None:
        """Set a data value for a given epoch, signal, and variable.

        Parameters
        ----------
        epoch_idx : int
            Index returned by :meth:`add_epoch`.
        signal : SignalID or str
            Signal identifier (SignalID or 'SV|band|code' string).
        var : str
            Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
        value : float
            The observation value.
        """
        sid = str(signal)
        if var not in self._values:
            self._values[var] = {}
        self._values[var][(epoch_idx, sid)] = value

    def build(
        self,
        keep_data_vars: list[str] | None = None,
        extra_attrs: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Build, validate, and return the Dataset.

        1. Sorts signals alphabetically
        2. Resolves frequencies from band names via SignalIDMapper
        3. Constructs coordinate arrays with correct dtypes (float32 for freq)
        4. Attaches CF-compliant metadata from COORDS_METADATA
        5. Calls validate_dataset() before returning

        Parameters
        ----------
        keep_data_vars : list of str, optional
            If provided, only include these data variables.  If ``None``,
            includes all variables that had values set.
        extra_attrs : dict, optional
            Additional global attributes to merge into the Dataset.

        Returns
        -------
        xr.Dataset
            Validated Dataset with dimensions ``(epoch, sid)``.
        """
        sorted_sids = sorted(self._signals)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(self._epochs)
        n_sids = len(sorted_sids)

        # --- Coordinate arrays ---
        epoch_arr = [
            np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
            for ts in self._epochs
        ]
        sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
        system_arr = np.array(
            [self._signals[s].system for s in sorted_sids], dtype=object
        )
        band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
        code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

        # Frequency resolution via SignalIDMapper
        freq_center = np.array(
            [
                self._mapper.get_band_frequency(self._signals[s].band) or np.nan
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        bandwidths = np.array(
            [
                self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        freq_min = (freq_center - bandwidths / 2).astype(np.float32)
        freq_max = (freq_center + bandwidths / 2).astype(np.float32)

        # --- Determine which variables to include ---
        all_vars = set(self._values.keys())
        if keep_data_vars is not None:
            vars_to_build = [v for v in keep_data_vars if v in all_vars]
        else:
            vars_to_build = sorted(all_vars)

        # --- Data variable arrays ---
        data_vars: dict[str, tuple] = {}
        for var in vars_to_build:
            dtype = DTYPES.get(var, np.dtype("float32"))
            fill = np.nan if np.issubdtype(dtype, np.floating) else -1
            arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

            for (ei, sid_str), val in self._values[var].items():
                if sid_str in sid_to_idx:
                    arr[ei, sid_to_idx[sid_str]] = val

            meta = _VAR_METADATA.get(var, {})
            data_vars[var] = (("epoch", "sid"), arr, meta)

        # --- Coordinates ---
        coords = {
            "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
            "system": ("sid", system_arr, COORDS_METADATA["system"]),
            "band": ("sid", band_arr, COORDS_METADATA["band"]),
            "code": ("sid", code_arr, COORDS_METADATA["code"]),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        # --- Global attributes ---
        attrs = self._reader._build_attrs()
        if extra_attrs:
            attrs.update(extra_attrs)

        ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

        # Validate before returning
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

`add_epoch(timestamp)` ¶

Register an epoch timestamp. Returns epoch index.

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def add_epoch(self, timestamp: datetime) -> int:
    """Register an epoch timestamp. Returns epoch index."""
    self._epochs.append(timestamp)
    return len(self._epochs) - 1

`add_signal(sv, band, code)` ¶

Register a signal (idempotent). Returns validated SignalID.

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def add_signal(self, sv: str, band: str, code: str) -> SignalID:
    """Register a signal (idempotent). Returns validated SignalID."""
    sig = SignalID(sv=sv, band=band, code=code)
    self._signals[sig.sid] = sig
    return sig

`set_value(epoch_idx, signal, var, value)` ¶

Set a data value for a given epoch, signal, and variable.

Parameters¶

epoch_idx : int Index returned by :meth:add_epoch. signal : SignalID or str Signal identifier (SignalID or 'SV|band|code' string). var : str Variable name (e.g. 'SNR', 'Pseudorange', 'Phase'). value : float The observation value.

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def set_value(
    self,
    epoch_idx: int,
    signal: SignalID | str,
    var: str,
    value: float,
) -> None:
    """Set a data value for a given epoch, signal, and variable.

    Parameters
    ----------
    epoch_idx : int
        Index returned by :meth:`add_epoch`.
    signal : SignalID or str
        Signal identifier (SignalID or 'SV|band|code' string).
    var : str
        Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
    value : float
        The observation value.
    """
    sid = str(signal)
    if var not in self._values:
        self._values[var] = {}
    self._values[var][(epoch_idx, sid)] = value

`build(keep_data_vars=None, extra_attrs=None)` ¶

Build, validate, and return the Dataset.

Sorts signals alphabetically
Resolves frequencies from band names via SignalIDMapper
Constructs coordinate arrays with correct dtypes (float32 for freq)
Attaches CF-compliant metadata from COORDS_METADATA
Calls validate_dataset() before returning

Parameters¶

keep_data_vars : list of str, optional If provided, only include these data variables. If None, includes all variables that had values set. extra_attrs : dict, optional Additional global attributes to merge into the Dataset.

Returns¶

xr.Dataset Validated Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def build(
    self,
    keep_data_vars: list[str] | None = None,
    extra_attrs: dict[str, str] | None = None,
) -> xr.Dataset:
    """Build, validate, and return the Dataset.

    1. Sorts signals alphabetically
    2. Resolves frequencies from band names via SignalIDMapper
    3. Constructs coordinate arrays with correct dtypes (float32 for freq)
    4. Attaches CF-compliant metadata from COORDS_METADATA
    5. Calls validate_dataset() before returning

    Parameters
    ----------
    keep_data_vars : list of str, optional
        If provided, only include these data variables.  If ``None``,
        includes all variables that had values set.
    extra_attrs : dict, optional
        Additional global attributes to merge into the Dataset.

    Returns
    -------
    xr.Dataset
        Validated Dataset with dimensions ``(epoch, sid)``.
    """
    sorted_sids = sorted(self._signals)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(self._epochs)
    n_sids = len(sorted_sids)

    # --- Coordinate arrays ---
    epoch_arr = [
        np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
        for ts in self._epochs
    ]
    sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
    system_arr = np.array(
        [self._signals[s].system for s in sorted_sids], dtype=object
    )
    band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
    code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

    # Frequency resolution via SignalIDMapper
    freq_center = np.array(
        [
            self._mapper.get_band_frequency(self._signals[s].band) or np.nan
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    bandwidths = np.array(
        [
            self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    freq_min = (freq_center - bandwidths / 2).astype(np.float32)
    freq_max = (freq_center + bandwidths / 2).astype(np.float32)

    # --- Determine which variables to include ---
    all_vars = set(self._values.keys())
    if keep_data_vars is not None:
        vars_to_build = [v for v in keep_data_vars if v in all_vars]
    else:
        vars_to_build = sorted(all_vars)

    # --- Data variable arrays ---
    data_vars: dict[str, tuple] = {}
    for var in vars_to_build:
        dtype = DTYPES.get(var, np.dtype("float32"))
        fill = np.nan if np.issubdtype(dtype, np.floating) else -1
        arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

        for (ei, sid_str), val in self._values[var].items():
            if sid_str in sid_to_idx:
                arr[ei, sid_to_idx[sid_str]] = val

        meta = _VAR_METADATA.get(var, {})
        data_vars[var] = (("epoch", "sid"), arr, meta)

    # --- Coordinates ---
    coords = {
        "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
        "system": ("sid", system_arr, COORDS_METADATA["system"]),
        "band": ("sid", band_arr, COORDS_METADATA["band"]),
        "code": ("sid", code_arr, COORDS_METADATA["code"]),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    # --- Global attributes ---
    attrs = self._reader._build_attrs()
    if extra_attrs:
        attrs.update(extra_attrs)

    ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

    # Validate before returning
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

`DatasetStructureValidator` ¶

Bases: BaseModel

Validates that an xarray.Dataset meets the GNSSDataReader contract.

Wraps a Dataset and checks it against the contract constants above. Use this in tests and reader implementations to catch structural errors early with clear messages.

Examples¶

validator = DatasetStructureValidator(dataset=ds) validator.validate_all() # raises ValueError on any violation validator.validate_dimensions() # check just one aspect

Source code in packages/canvod-readers/src/canvod/readers/base.py

class DatasetStructureValidator(BaseModel):
    """Validates that an xarray.Dataset meets the GNSSDataReader contract.

    Wraps a Dataset and checks it against the contract constants above.
    Use this in tests and reader implementations to catch structural errors
    early with clear messages.

    Examples
    --------
    >>> validator = DatasetStructureValidator(dataset=ds)
    >>> validator.validate_all()          # raises ValueError on any violation
    >>> validator.validate_dimensions()   # check just one aspect
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    dataset: xr.Dataset

    def validate_all(self, required_vars: list[str] | None = None) -> None:
        """Run all validations, collecting **all** errors.

        Delegates to :func:`validate_dataset` so the logic lives in one place.
        """
        validate_dataset(self.dataset, required_vars=required_vars)

    def validate_dimensions(self) -> None:
        """Check that required dimensions (epoch, sid) exist."""
        missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
        if missing:
            raise ValueError(f"Missing required dimensions: {missing}")

    def validate_coordinates(self) -> None:
        """Check that required coordinates exist with correct dtypes."""
        for coord, expected_dtype in REQUIRED_COORDS.items():
            if coord not in self.dataset.coords:
                raise ValueError(f"Missing required coordinate: {coord}")
            actual = str(self.dataset[coord].dtype)
            if expected_dtype == "object":
                is_valid_string = actual == "object" or actual.startswith("StringDType")
                if not is_valid_string:
                    raise ValueError(
                        f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                    )
            elif expected_dtype not in actual:
                raise ValueError(
                    f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
                )

    def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
        """Check that required data variables exist with correct dims."""
        if required_vars is None:
            required_vars = list(DEFAULT_REQUIRED_VARS)
        missing = set(required_vars) - set(self.dataset.data_vars)
        if missing:
            raise ValueError(f"Missing required data variables: {missing}")
        for var in self.dataset.data_vars:
            if self.dataset[var].dims != REQUIRED_DIMS:
                raise ValueError(
                    f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                    f"got {self.dataset[var].dims}"
                )

    def validate_attributes(self) -> None:
        """Check that required global attributes are present."""
        missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
        if missing:
            raise ValueError(f"Missing required attributes: {missing}")

`validate_all(required_vars=None)` ¶

Run all validations, collecting all errors.

Delegates to :func:validate_dataset so the logic lives in one place.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_all(self, required_vars: list[str] | None = None) -> None:
    """Run all validations, collecting **all** errors.

    Delegates to :func:`validate_dataset` so the logic lives in one place.
    """
    validate_dataset(self.dataset, required_vars=required_vars)

`validate_dimensions()` ¶

Check that required dimensions (epoch, sid) exist.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_dimensions(self) -> None:
    """Check that required dimensions (epoch, sid) exist."""
    missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
    if missing:
        raise ValueError(f"Missing required dimensions: {missing}")

`validate_coordinates()` ¶

Check that required coordinates exist with correct dtypes.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_coordinates(self) -> None:
    """Check that required coordinates exist with correct dtypes."""
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in self.dataset.coords:
            raise ValueError(f"Missing required coordinate: {coord}")
        actual = str(self.dataset[coord].dtype)
        if expected_dtype == "object":
            is_valid_string = actual == "object" or actual.startswith("StringDType")
            if not is_valid_string:
                raise ValueError(
                    f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                )
        elif expected_dtype not in actual:
            raise ValueError(
                f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
            )

`validate_data_variables(required_vars=None)` ¶

Check that required data variables exist with correct dims.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
    """Check that required data variables exist with correct dims."""
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)
    missing = set(required_vars) - set(self.dataset.data_vars)
    if missing:
        raise ValueError(f"Missing required data variables: {missing}")
    for var in self.dataset.data_vars:
        if self.dataset[var].dims != REQUIRED_DIMS:
            raise ValueError(
                f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                f"got {self.dataset[var].dims}"
            )

`validate_attributes()` ¶

Check that required global attributes are present.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_attributes(self) -> None:
    """Check that required global attributes are present."""
    missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
    if missing:
        raise ValueError(f"Missing required attributes: {missing}")

`Rnxv3Obs` ¶

Bases: GNSSDataReader

RINEX v3.04 observation reader.

Attributes¶

fpath : Path Path to the RINEX observation file. polarization : str, default "RHCP" Polarization label for observables. completeness_mode : {"strict", "warn", "off"}, default "strict" Behavior when epoch completeness checks fail. expected_dump_interval : str or pint.Quantity, optional Expected file dump interval for completeness validation. expected_sampling_interval : str or pint.Quantity, optional Expected sampling interval for completeness validation. apply_overlap_filter : bool, default False Whether to filter overlapping signal groups. overlap_preferences : dict[str, str], optional Preferred signals for overlap resolution. aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels.

Notes¶

Inherits fpath, its validator, and arbitrary_types_allowed from :class:GNSSDataReader.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

class Rnxv3Obs(GNSSDataReader):
    """RINEX v3.04 observation reader.

    Attributes
    ----------
    fpath : Path
        Path to the RINEX observation file.
    polarization : str, default "RHCP"
        Polarization label for observables.
    completeness_mode : {"strict", "warn", "off"}, default "strict"
        Behavior when epoch completeness checks fail.
    expected_dump_interval : str or pint.Quantity, optional
        Expected file dump interval for completeness validation.
    expected_sampling_interval : str or pint.Quantity, optional
        Expected sampling interval for completeness validation.
    apply_overlap_filter : bool, default False
        Whether to filter overlapping signal groups.
    overlap_preferences : dict[str, str], optional
        Preferred signals for overlap resolution.
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels.

    Notes
    -----
    Inherits ``fpath``, its validator, and ``arbitrary_types_allowed``
    from :class:`GNSSDataReader`.

    """

    model_config = ConfigDict(frozen=True)

    polarization: str = "RHCP"

    completeness_mode: Literal["strict", "warn", "off"] = "strict"
    expected_dump_interval: str | pint.Quantity | None = None
    expected_sampling_interval: str | pint.Quantity | None = None

    apply_overlap_filter: bool = False
    overlap_preferences: dict[str, str] | None = None

    aggregate_glonass_fdma: bool = True

    _header: Rnxv3Header = PrivateAttr()
    _signal_mapper: SignalIDMapper = PrivateAttr()

    _lines: list[str] = PrivateAttr()
    _file_hash: str = PrivateAttr()
    _cached_epoch_batches: list[tuple[int, int]] | None = PrivateAttr(default=None)

    @model_validator(mode="after")
    def _post_init(self) -> Self:
        """Initialize derived state after validation."""
        # Load header once
        self._header = Rnxv3Header.from_file(self.fpath)

        # Initialize signal mapper
        self._signal_mapper = SignalIDMapper(
            aggregate_glonass_fdma=self.aggregate_glonass_fdma
        )

        # Optionally auto-check completeness
        if self.completeness_mode != "off":
            try:
                self.validate_epoch_completeness(
                    dump_interval=self.expected_dump_interval,
                    sampling_interval=self.expected_sampling_interval,
                )
            except MissingEpochError as e:
                if self.completeness_mode == "strict":
                    raise
                warnings.warn(str(e), RuntimeWarning, stacklevel=2)

        # Cache file lines
        self._lines = self._load_file()

        return self

    @property
    def header(self) -> Rnxv3Header:
        """Expose validated header (read-only).

        Returns
        -------
        Rnxv3Header
            Parsed and validated RINEX header.

        """
        return self._header

    def __str__(self) -> str:
        """Return a human-readable summary."""
        return (
            f"{self.__class__.__name__}:\n"
            f"  File Path: {self.fpath}\n"
            f"  Header: {self.header}\n"
            f"  Polarization: {self.polarization}\n"
        )

    def __repr__(self) -> str:
        """Return a concise representation for debugging."""
        return f"{self.__class__.__name__}(fpath={self.fpath})"

    def _load_file(self) -> list[str]:
        """Read file once, cache lines, and compute hash.

        Returns
        -------
        list[str]
            File contents split into lines.

        """
        if not hasattr(self, "_lines"):
            h = hashlib.sha256()
            with self.fpath.open("rb") as f:  # binary mode for consistent hash
                data = f.read()
                h.update(data)
                self._lines = data.decode("utf-8", errors="replace").splitlines()
            self._file_hash = h.hexdigest()[:16]  # short hash for storage
        return self._lines

    @property
    def file_hash(self) -> str:
        """Return cached SHA256 short hash of the file content.

        Returns
        -------
        str
            16-character short hash for deduplication.

        """
        return self._file_hash

    @property
    def start_time(self) -> datetime:
        """Return start time of observations from header.

        Returns
        -------
        datetime
            First observation timestamp.

        """
        return min(self.header.t0.values())

    @property
    def end_time(self) -> datetime:
        """Return end time of observations from last epoch.

        Returns
        -------
        datetime
            Last observation timestamp.

        """
        last_epoch = None
        for epoch in self.iter_epochs():
            last_epoch = epoch
        if last_epoch:
            return self.get_datetime_from_epoch_record_info(last_epoch.info)
        return self.start_time

    @property
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers (G, R, E, C, J, S, I).

        """
        if self.header.systems == "M":
            return list(self.header.obs_codes_per_system.keys())
        return [self.header.systems]

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Returns
        -------
        int
            Total epoch count.

        """
        return len(list(self.get_epoch_record_batches()))

    @property
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.

        """
        satellites = set()
        for epoch in self.iter_epochs():
            for sat in epoch.data:
                satellites.add(sat.sv)
        return len(satellites)

    def get_epoch_record_batches(
        self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
    ) -> list[tuple[int, int]]:
        """Get the start and end line numbers for each epoch in the file.

        Parameters
        ----------
        epoch_record_indicator : str, default '>'
            Character marking epoch record lines.

        Returns
        -------
        list of tuple of int
            List of (start_line, end_line) pairs for each epoch.

        """
        if self._cached_epoch_batches is not None:
            return self._cached_epoch_batches

        lines = self._load_file()
        starts = [
            i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
        ]
        starts.append(len(lines))  # Add EOF
        self._cached_epoch_batches = [
            (start, starts[i + 1])
            for i, start in enumerate(starts)
            if i + 1 < len(starts)
        ]
        return self._cached_epoch_batches

    def parse_observation_slice(
        self,
        slice_text: str,
    ) -> tuple[float | None, int | None, int | None]:
        """Parse a RINEX observation slice into value, LLI, and SSI.

        Enhanced to handle both standard 16-character format and
        variable-length records.

        Parameters
        ----------
        slice_text : str
            Observation slice to parse.

        Returns
        -------
        tuple[float | None, int | None, int | None]
            Parsed (value, LLI, SSI) tuple.

        """
        if not slice_text or not slice_text.strip():
            return None, None, None

        try:
            # Method 1: Standard RINEX format with decimal at position -6
            if (
                len(slice_text) >= OBS_SLICE_MIN_LEN
                and len(slice_text) <= OBS_SLICE_MAX_LEN
                and slice_text[OBS_SLICE_DECIMAL_POS] == "."
            ):
                slice_chars = list(slice_text)
                ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
                lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

                # Convert LLI and SSI
                lli = int(lli) if lli.strip() and lli.isdigit() else None
                ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

                # Convert value
                value_str = "".join(slice_chars).strip()
                if value_str:
                    value = float(value_str)
                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        try:
            # Method 2: Flexible parsing for variable-length records
            slice_trimmed = slice_text.strip()
            if not slice_trimmed:
                return None, None, None

            # Look for a decimal point to identify the numeric value
            if "." in slice_trimmed:
                # Find the main numeric value (supports negative numbers)
                number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

                if number_match:
                    value = float(number_match.group(1))

                    # Check for LLI/SSI indicators after the number
                    remaining_part = slice_trimmed[number_match.end() :].strip()
                    lli = None
                    ssi = None

                    # Parse remaining characters as potential LLI/SSI
                    if remaining_part:
                        # Could be just SSI, or LLI followed by SSI
                        if len(remaining_part) == 1:
                            # Just one indicator - assume it's SSI
                            if remaining_part.isdigit():
                                ssi = int(remaining_part)
                        elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                            # Two or more characters - take last two as LLI, SSI
                            lli_char = remaining_part[-2]
                            ssi_char = remaining_part[-1]

                            if lli_char.isdigit():
                                lli = int(lli_char)
                            if ssi_char.isdigit():
                                ssi = int(ssi_char)

                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        # Method 3: Last resort - try simple float parsing
        try:
            simple_value = float(slice_text.strip())
            return simple_value, None, None
        except ValueError:
            pass

        return None, None, None

    def process_satellite_data(self, s: str) -> Satellite:
        """Process satellite data line into a Satellite object with observations.

        Handles variable-length observation records correctly by adaptively parsing
        based on the actual line length and content.
        """
        sv = s[:3].strip()
        satellite = Satellite(sv=sv)
        bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

        # Get the data part (after sv identifier)
        data_part = s[3:]

        # Process each observation adaptively
        for i, band in enumerate(bands_tbe):
            start_idx = i * 16
            end_idx = start_idx + 16

            # Check if we have enough data for this observation
            if start_idx >= len(data_part):
                # No more data available - create empty observation
                observation = Observation(
                    obs_type=band.split("|")[1][0],
                    value=None,
                    lli=None,
                    ssi=None,
                )
                satellite.add_observation(observation)
                continue

            # Extract the slice, but handle variable length
            if end_idx <= len(data_part):
                # Full 16-character slice available
                slice_data = data_part[start_idx:end_idx]
            else:
                # Partial slice - pad with spaces to maintain consistency
                available_slice = data_part[start_idx:]
                slice_data = available_slice.ljust(16)  # Pad with spaces if needed

            value, lli, ssi = self.parse_observation_slice(slice_data)

            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=value,
                lli=lli,
                ssi=ssi,
            )
            satellite.add_observation(observation)

        return satellite

    @property
    def epochs(self) -> list[Rnxv3ObsEpochRecord]:
        """Materialize all epochs (legacy compatibility).

        Returns
        -------
        list of Rnxv3ObsEpochRecord
            All epochs in memory (use iter_epochs for efficiency)

        """
        return list(self.iter_epochs())

    def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
        """Yield epochs one by one instead of materializing the whole list.

        Returns
        -------
        Generator
            Generator yielding Rnxv3ObsEpochRecord objects

        Yields
        ------
        Rnxv3ObsEpochRecord
            Each epoch with timestamp and satellite observations

        """
        for start, end in self.get_epoch_record_batches():
            try:
                info = Rnxv3ObsEpochRecordLineModel.model_validate(
                    {"epoch": self._lines[start]}
                )

                # Skip event epochs (flag 2-6: special records, not observations)
                if info.epoch_flag > 1:
                    continue

                # Filter out blank/whitespace-only lines from data slice
                data = [line for line in self._lines[start + 1 : end] if line.strip()]
                epoch = Rnxv3ObsEpochRecord(
                    info=info,
                    data=[self.process_satellite_data(line) for line in data],
                )
                yield epoch
            except InvalidEpochError, IncompleteEpochError, ValueError:
                # Skip epochs with validation errors (invalid SV, malformed data,
                # pydantic ValidationError inherits from ValueError)
                pass

    def iter_epochs_in_range(
        self,
        start: datetime,
        end: datetime,
    ) -> Iterable[Rnxv3ObsEpochRecord]:
        """Yield epochs lazily that fall into the given datetime range.

        Parameters
        ----------
        start : datetime
            Start of time range (inclusive)
        end : datetime
            End of time range (inclusive)

        Returns
        -------
        Generator
            Generator yielding epochs in the specified range

        Yields
        ------
        Rnxv3ObsEpochRecord
            Epochs within the time range

        """
        for epoch in self.iter_epochs():
            dt = self.get_datetime_from_epoch_record_info(epoch.info)
            if start <= dt <= end:
                yield epoch

    def get_datetime_from_epoch_record_info(
        self,
        epoch_record_info: Rnxv3ObsEpochRecordLineModel,
    ) -> datetime:
        """Convert epoch record info to datetime object.

        Parameters
        ----------
        epoch_record_info : Rnxv3ObsEpochRecordLineModel
            Parsed epoch record line

        Returns
        -------
        datetime
            Timestamp from epoch record

        """
        return datetime(
            year=int(epoch_record_info.year),
            month=int(epoch_record_info.month),
            day=int(epoch_record_info.day),
            hour=int(epoch_record_info.hour),
            minute=int(epoch_record_info.minute),
            second=int(epoch_record_info.seconds),
            tzinfo=UTC,
        )

    @staticmethod
    def epochrecordinfo_dt_to_numpy_dt(
        epch: Rnxv3ObsEpochRecord,
    ) -> np.datetime64:
        """Convert Python datetime to numpy datetime64[ns].

        Parameters
        ----------
        epch : Rnxv3ObsEpochRecord
            Epoch record containing timestamp info

        Returns
        -------
        np.datetime64
            Numpy datetime64 with nanosecond precision

        """
        dt = datetime(
            year=int(epch.info.year),
            month=int(epch.info.month),
            day=int(epch.info.day),
            hour=int(epch.info.hour),
            minute=int(epch.info.minute),
            second=int(epch.info.seconds),
            tzinfo=UTC,
        )
        # np.datetime64 doesn't support timezone info, but datetime is already UTC
        # Convert to naive datetime (UTC) to avoid warning
        return np.datetime64(dt.replace(tzinfo=None), "ns")

    def _epoch_datetimes(self) -> list[datetime]:
        """Extract epoch datetimes from the file.

        Uses the same epoch parsing logic already implemented.
        """
        dts: list[datetime] = []

        for start, _end in self.get_epoch_record_batches():
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )
            dts.append(
                datetime(
                    year=int(info.year),
                    month=int(info.month),
                    day=int(info.day),
                    hour=int(info.hour),
                    minute=int(info.minute),
                    second=int(info.seconds),
                    tzinfo=UTC,
                )
            )
        return dts

    def infer_sampling_interval(self) -> pint.Quantity | None:
        """Infer sampling interval from consecutive epoch deltas.

        Returns
        -------
        pint.Quantity or None
            Sampling interval in seconds, or None if cannot be inferred

        """
        dts = self._epoch_datetimes()
        if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
            return None
        # Compute deltas
        deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
        if not deltas:
            return None
        # Pick the most common delta (robust to an occasional missing epoch)
        seconds = Counter(
            int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
        )
        if not seconds:
            return None
        mode_seconds, _ = seconds.most_common(1)[0]
        return (mode_seconds * UREG.second).to(UREG.seconds)

    def infer_dump_interval(
        self, sampling_interval: pint.Quantity | None = None
    ) -> pint.Quantity | None:
        """Infer the intended dump interval for the RINEX file.

        Parameters
        ----------
        sampling_interval : pint.Quantity, optional
            Known sampling interval. If provided, returns (#epochs * sampling_interval)

        Returns
        -------
        pint.Quantity or None
            Dump interval in seconds, or None if cannot be inferred

        """
        idx = self.get_epoch_record_batches()
        n_epochs = len(idx)
        if n_epochs == 0:
            return None

        if sampling_interval is not None:
            return (n_epochs * sampling_interval).to(UREG.seconds)

        # Fallback: time coverage inclusive (last - first) + typical step
        dts = self._epoch_datetimes()
        if len(dts) == 0:
            return None
        if len(dts) == 1:
            # single epoch: treat as 1 * unknown step (cannot infer)
            return None

        # Estimate step from data
        est_step = self.infer_sampling_interval()
        if est_step is None:
            return None

        # Inclusive coverage often equals (n_epochs - 1) * step; intended
        # dump interval is n_epochs * step.
        return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

    def validate_epoch_completeness(
        self,
        dump_interval: str | pint.Quantity | None = None,
        sampling_interval: str | pint.Quantity | None = None,
    ) -> None:
        """Validate that the number of epochs matches the expected dump interval.

        Parameters
        ----------
        dump_interval : str or pint.Quantity, optional
            Expected file dump interval. If None, inferred from epochs.
        sampling_interval : str or pint.Quantity, optional
            Expected sampling interval. If None, inferred from epochs.

        Returns
        -------
        None

        Raises
        ------
        MissingEpochError
            If total sampling time doesn't match dump interval
        ValueError
            If intervals cannot be inferred

        """
        # Normalize/Infer sampling interval
        if sampling_interval is None:
            inferred = self.infer_sampling_interval()
            if inferred is None:
                msg = "Could not infer sampling interval from epochs"
                raise ValueError(msg)
            sampling_interval = inferred
        # normalize to pint
        elif not isinstance(sampling_interval, pint.Quantity):
            sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

        # Normalize/Infer dump interval
        if dump_interval is None:
            inferred_dump = self.infer_dump_interval(
                sampling_interval=sampling_interval
            )
            if inferred_dump is None:
                msg = "Could not infer dump interval from file"
                raise ValueError(msg)
            dump_interval = inferred_dump
        elif not isinstance(dump_interval, pint.Quantity):
            # Accept '15 min', '1h', etc.
            dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

        # Build inputs for the validator model
        epoch_indices = self.get_epoch_record_batches()

        # This throws MissingEpochError automatically if inconsistent
        cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
            epoch_records_indeces=epoch_indices,
            rnx_file_dump_interval=dump_interval,
            sampling_interval=sampling_interval,
        )

    def filter_by_overlapping_groups(
        self,
        ds: xr.Dataset,
        group_preference: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Filter overlapping bands using per-group preferences.

        Parameters
        ----------
        ds : xr.Dataset
            Dataset with `sid` dimension and signal properties.
        group_preference : dict[str, str], optional
            Mapping of overlap group to preferred band.

        Returns
        -------
        xr.Dataset
            Dataset filtered to preferred overlapping bands.

        """
        if group_preference is None:
            group_preference = {
                "L1_E1_B1I": "L1",
                "L5_E5a": "L5",
                "L2_E5b_B2b": "L2",
            }

        keep = []
        for sid in ds.sid.values:
            parts = str(sid).split("|")
            band = parts[1] if len(parts) >= 2 else ""
            group = self._signal_mapper.get_overlapping_group(band)
            if group and group in group_preference:
                if band == group_preference[group]:
                    keep.append(sid)
            else:
                keep.append(sid)
        return ds.sel(sid=keep)

    def _precompute_sids_from_header(
        self,
    ) -> tuple[list[str], dict[str, dict[str, object]]]:
        """Build sorted SID list and properties from header info alone.

        Uses the header's obs_codes_per_system and static constellation
        SV lists to pre-compute the full theoretical SID set, eliminating
        the discovery pass.

        Returns
        -------
        sorted_sids : list[str]
            Sorted list of signal IDs.
        sid_properties : dict[str, dict[str, object]]
            Mapping of SID to its properties (sv, system, band, code,
            freq_center, freq_min, freq_max, bandwidth, overlapping_group).

        """
        mapper = self._signal_mapper
        signal_ids: set[str] = set()
        sid_properties: dict[str, dict[str, object]] = {}

        # Pre-compute pint arithmetic once per unique band
        band_freq_cache: dict[str, tuple[float, float, float, float]] = {}

        for system, obs_codes in self.header.obs_codes_per_system.items():
            svs = _get_constellation_svs(system)

            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]

                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )

                # Cache frequency arithmetic per band
                if band_name not in band_freq_cache:
                    center_frequency = mapper.get_band_frequency(band_name)
                    bandwidth = mapper.get_band_bandwidth(band_name)

                    if center_frequency is not None and bandwidth is not None:
                        bw = bandwidth[0] if isinstance(bandwidth, list) else bandwidth
                        freq_min = center_frequency - (bw / 2.0)
                        freq_max = center_frequency + (bw / 2.0)
                        band_freq_cache[band_name] = (
                            float(center_frequency),
                            float(freq_min),
                            float(freq_max),
                            float(bw),
                        )
                    else:
                        band_freq_cache[band_name] = (
                            np.nan,
                            np.nan,
                            np.nan,
                            np.nan,
                        )

                freq_center, freq_min, freq_max, bw = band_freq_cache[band_name]
                overlapping_group = mapper.get_overlapping_group(band_name)

                sid_suffix = "|" + band_name + "|" + code_char

                for sv in svs:
                    sid = sv + sid_suffix
                    if sid not in signal_ids:
                        signal_ids.add(sid)
                        sid_properties[sid] = {
                            "sv": sv,
                            "system": system,
                            "band": band_name,
                            "code": code_char,
                            "freq_center": freq_center,
                            "freq_min": freq_min,
                            "freq_max": freq_max,
                            "bandwidth": bw,
                            "overlapping_group": overlapping_group,
                        }

        sorted_sids = sorted(signal_ids)
        return sorted_sids, {s: sid_properties[s] for s in sorted_sids}

    def _create_dataset_single_pass(self) -> xr.Dataset:
        """Create xarray Dataset in a single pass over the file.

        Pre-allocates arrays using header-derived SID set and epoch count,
        then fills them by parsing observations inline without Pydantic
        models or function-call overhead.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and standard variables.

        """
        lines = self._load_file()
        epoch_batches = self.get_epoch_record_batches()
        n_epochs = len(epoch_batches)

        sorted_sids, sid_properties = self._precompute_sids_from_header()
        n_sids = len(sorted_sids)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}

        # Pre-allocate arrays
        timestamps = np.empty(n_epochs, dtype="datetime64[ns]")
        snr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pseudo = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        phase = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        doppler = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        lli = np.full((n_epochs, n_sids), -1, dtype=DTYPES["LLI"])
        ssi = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        # Build obs_code → (obs_type, sid_suffix) lookup per system
        mapper = self._signal_mapper
        system_obs_lut: dict[str, list[tuple[str, str]]] = {}
        for system, obs_codes in self.header.obs_codes_per_system.items():
            lut: list[tuple[str, str]] = []
            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    lut.append(("", ""))
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]
                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )
                obs_type = obs_code[0]
                lut.append((obs_type, "|" + band_name + "|" + code_char))
            system_obs_lut[system] = lut

        # Single pass over all epochs — skip unparseable epoch lines
        valid_mask = np.ones(n_epochs, dtype=bool)
        for t_idx, (start, end) in enumerate(epoch_batches):
            epoch_line = lines[start]

            # Inline epoch parsing (no Pydantic model)
            m = _EPOCH_RE.match(epoch_line)
            if m is None:
                valid_mask[t_idx] = False
                continue

            year, month, day = int(m[1]), int(m[2]), int(m[3])
            hour, minute = int(m[4]), int(m[5])
            seconds = float(m[6])
            sec_int = int(seconds)
            usec = int((seconds - sec_int) * 1_000_000)
            ts = np.datetime64(
                f"{year:04d}-{month:02d}-{day:02d}"
                f"T{hour:02d}:{minute:02d}:{sec_int:02d}",
                "ns",
            )
            ts += np.timedelta64(usec, "us")
            timestamps[t_idx] = ts

            # Parse satellite data lines inline
            for line_idx in range(start + 1, end):
                sat_line = lines[line_idx]
                if len(sat_line) < 3:
                    continue
                sv = sat_line[:3].strip()
                if not sv:
                    continue
                system = sv[0]
                lut_list = system_obs_lut.get(system)
                if lut_list is None:
                    continue

                data_part = sat_line[3:]
                data_part_len = len(data_part)

                for i, (obs_type, sid_suffix) in enumerate(lut_list):
                    if not obs_type:
                        continue

                    col_start = i * 16
                    if col_start >= data_part_len:
                        break

                    sid_key = sv + sid_suffix
                    s_idx = sid_to_idx.get(sid_key)
                    if s_idx is None:
                        continue

                    col_end = col_start + 16
                    slice_text = data_part[col_start:col_end]

                    value, obs_lli, obs_ssi = _parse_obs_fast(slice_text)
                    if value is None:
                        continue

                    if obs_type == "S":
                        if value != 0:
                            snr[t_idx, s_idx] = value
                    elif obs_type == "C":
                        pseudo[t_idx, s_idx] = value
                    elif obs_type == "L":
                        phase[t_idx, s_idx] = value
                    elif obs_type == "D":
                        doppler[t_idx, s_idx] = value

                    if obs_lli is not None:
                        lli[t_idx, s_idx] = obs_lli
                    if obs_ssi is not None:
                        ssi[t_idx, s_idx] = obs_ssi

        # Drop epochs that failed to parse
        if not valid_mask.all():
            timestamps = timestamps[valid_mask]
            snr = snr[valid_mask]
            pseudo = pseudo[valid_mask]
            phase = phase[valid_mask]
            doppler = doppler[valid_mask]
            lli = lli[valid_mask]
            ssi = ssi[valid_mask]

        # Build coordinate arrays from pre-computed properties
        sv_list = np.array(
            [sid_properties[sid]["sv"] for sid in sorted_sids], dtype=object
        )
        constellation_list = np.array(
            [sid_properties[sid]["system"] for sid in sorted_sids], dtype=object
        )
        band_list = np.array(
            [sid_properties[sid]["band"] for sid in sorted_sids], dtype=object
        )
        code_list = np.array(
            [sid_properties[sid]["code"] for sid in sorted_sids], dtype=object
        )
        freq_center_list = [sid_properties[sid]["freq_center"] for sid in sorted_sids]
        freq_min_list = [sid_properties[sid]["freq_min"] for sid in sorted_sids]
        freq_max_list = [sid_properties[sid]["freq_max"] for sid in sorted_sids]

        signal_id_coord = xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        )
        coords = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": signal_id_coord,
            "sv": ("sid", sv_list, COORDS_METADATA["sv"]),
            "system": ("sid", constellation_list, COORDS_METADATA["system"]),
            "band": ("sid", band_list, COORDS_METADATA["band"]),
            "code": ("sid", code_list, COORDS_METADATA["code"]),
            "freq_center": (
                "sid",
                np.asarray(freq_center_list, dtype=DTYPES["freq_center"]),
                COORDS_METADATA["freq_center"],
            ),
            "freq_min": (
                "sid",
                np.asarray(freq_min_list, dtype=DTYPES["freq_min"]),
                COORDS_METADATA["freq_min"],
            ),
            "freq_max": (
                "sid",
                np.asarray(freq_max_list, dtype=DTYPES["freq_max"]),
                COORDS_METADATA["freq_max"],
            ),
        }

        if self.header.signal_strength_unit == UREG.dBHz:
            snr_meta = CN0_METADATA
        else:
            snr_meta = SNR_METADATA

        ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr, snr_meta),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pseudo,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (
                    ["epoch", "sid"],
                    phase,
                    OBSERVABLES_METADATA["Phase"],
                ),
                "Doppler": (
                    ["epoch", "sid"],
                    doppler,
                    OBSERVABLES_METADATA["Doppler"],
                ),
                "LLI": (
                    ["epoch", "sid"],
                    lli,
                    OBSERVABLES_METADATA["LLI"],
                ),
                "SSI": (
                    ["epoch", "sid"],
                    ssi,
                    OBSERVABLES_METADATA["SSI"],
                ),
            },
            coords=coords,
            attrs={**self._build_attrs()},
        )

        if self.apply_overlap_filter:
            ds = self.filter_by_overlapping_groups(ds, self.overlap_preferences)

        return ds

    def create_rinex_netcdf_with_signal_id(
        self,
        start: datetime | None = None,
        end: datetime | None = None,
    ) -> xr.Dataset:
        """Create a NetCDF dataset with signal IDs.

        Always uses the fast single-pass path.  Optionally restricts to
        epochs within a datetime range via post-filtering.

        Parameters
        ----------
        start : datetime, optional
            Start of time range (inclusive).
        end : datetime, optional
            End of time range (inclusive).

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid).

        """
        ds = self._create_dataset_single_pass()

        if start or end:
            ds = ds.sel(epoch=slice(start, end))

        return ds

    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert RINEX observations to xarray.Dataset with signal ID structure.

        Parameters
        ----------
        outname : Path or str, optional
            If provided, saves dataset to this file path
        keep_data_vars : list of str or None, optional
            Data variables to include in dataset. Defaults to config value.
        write_global_attrs : bool, default False
            If True, adds comprehensive global attributes
        pad_global_sid : bool, default True
            If True, pads to global signal ID space
        strip_fillval : bool, default True
            If True, removes fill values
        add_future_datavars : bool, default True
            If True, adds placeholder variables for future data
        keep_sids : list of str or None, default None
            If provided, filters/pads dataset to these specific SIDs.
            If None and pad_global_sid=True, pads to all possible SIDs.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and requested data variables

        """
        outname = cast(Path | str | None, kwargs.pop("outname", None))
        write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
        pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
        strip_fillval = bool(kwargs.pop("strip_fillval", True))
        add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
        keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

        if keep_data_vars is None:
            from canvod.utils.config import load_config

            keep_data_vars = load_config().processing.processing.keep_rnx_vars

        ds = self.create_rinex_netcdf_with_signal_id()

        # drop unwanted vars
        for var in list(ds.data_vars):
            if var not in keep_data_vars:
                ds = ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            # Pad/filter to specified sids or all possible sids
            ds = pad_to_global_sid(ds, keep_sids=keep_sids)

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            ds = strip_fillvalue(ds)

        if add_future_datavars:
            pass

        if write_global_attrs:
            ds.attrs.update(self._create_comprehensive_attrs())

        ds.attrs.update(self._build_attrs())

        if outname:
            from canvod.utils.config import load_config as _load_config

            comp = _load_config().processing.compression
            encoding = {
                var: {"zlib": comp.zlib, "complevel": comp.complevel}
                for var in ds.data_vars
            }
            ds.to_netcdf(str(outname), encoding=encoding)

        # Validate output structure for pipeline compatibility
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

    def validate_rinex_304_compliance(
        self,
        ds: xr.Dataset | None = None,
        strict: bool = False,
        print_report: bool = True,
    ) -> dict[str, list[str]]:
        """Run enhanced RINEX 3.04 specification validation.

        Validates:
        1. System-specific observation codes
        2. GLONASS mandatory fields (slot/frequency, biases)
        3. Phase shift records (RINEX 3.01+)
        4. Observation value ranges

        Parameters
        ----------
        ds : xr.Dataset, optional
            Dataset to validate. If None, creates one from current file.
        strict : bool
            If True, raise ValueError on validation failures
        print_report : bool
            If True, print validation report to console

        Returns
        -------
        dict[str, list[str]]
            Validation results by category

        Examples
        --------
        >>> reader = Rnxv3Obs(fpath="station.24o")
        >>> results = reader.validate_rinex_304_compliance()
        >>> # Or validate a specific dataset
        >>> ds = reader.to_ds()
        >>> results = reader.validate_rinex_304_compliance(ds=ds)

        """
        if ds is None:
            ds = self.to_ds(write_global_attrs=False)

        # Prepare header dict for validators
        header_dict: dict[str, Any] = {
            "obs_codes_per_system": self.header.obs_codes_per_system,
        }

        # Add GLONASS-specific headers if available
        if hasattr(self.header, "glonass_slot_frq"):
            header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

        if hasattr(self.header, "glonass_cod_phs_bis"):
            header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

        if hasattr(self.header, "phase_shift"):
            header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

        # Run validation
        results = RINEX304ComplianceValidator.validate_all(
            ds=ds, header_dict=header_dict, strict=strict
        )

        if print_report:
            RINEX304ComplianceValidator.print_validation_report(results)

        return results

    def _create_comprehensive_attrs(self) -> dict[str, object]:
        attrs: dict[str, object] = {
            "File Path": str(self.fpath),
            "File Type": self.header.filetype,
            "RINEX Version": self.header.version,
            "RINEX Type": self.header.rinextype,
            "Observer": self.header.observer,
            "Agency": self.header.agency,
            "Date": self.header.date.isoformat(),
            "Marker Name": self.header.marker_name,
            "Marker Number": self.header.marker_number,
            "Marker Type": self.header.marker_type,
            "Approximate Position": (
                f"(X = {self.header.approx_position[0].magnitude} "
                f"{self.header.approx_position[0].units:~}, "
                f"Y = {self.header.approx_position[1].magnitude} "
                f"{self.header.approx_position[1].units:~}, "
                f"Z = {self.header.approx_position[2].magnitude} "
                f"{self.header.approx_position[2].units:~})"
            ),
            "Receiver Type": self.header.receiver_type,
            "Receiver Version": self.header.receiver_version,
            "Receiver Number": self.header.receiver_number,
            "Antenna Type": self.header.antenna_type,
            "Antenna Number": self.header.antenna_number,
            "Antenna Position": (
                f"(X = {self.header.antenna_position[0].magnitude} "
                f"{self.header.antenna_position[0].units:~}, "
                f"Y = {self.header.antenna_position[1].magnitude} "
                f"{self.header.antenna_position[1].units:~}, "
                f"Z = {self.header.antenna_position[2].magnitude} "
                f"{self.header.antenna_position[2].units:~})"
            ),
            "Program": self.header.pgm,
            "Run By": self.header.run_by,
            "Time of First Observation": json.dumps(
                {k: v.isoformat() for k, v in self.header.t0.items()}
            ),
            "GLONASS COD": self.header.glonass_cod,
            "GLONASS PHS": self.header.glonass_phs,
            "GLONASS BIS": self.header.glonass_bis,
            "GLONASS Slot Frequency Dict": json.dumps(
                self.header.glonass_slot_freq_dict
            ),
            "Leap Seconds": f"{self.header.leap_seconds:~}",
        }
        return attrs

`header` `property` ¶

Expose validated header (read-only).

Rnxv3Header Parsed and validated RINEX header.

`file_hash` `property` ¶

Return cached SHA256 short hash of the file content.

Returns¶

str 16-character short hash for deduplication.

`start_time` `property` ¶

Return start time of observations from header.

Returns¶

datetime First observation timestamp.

`end_time` `property` ¶

Return end time of observations from last epoch.

Returns¶

datetime Last observation timestamp.

`systems` `property` ¶

Return list of GNSS systems in file.

Returns¶

list of str System identifiers (G, R, E, C, J, S, I).

`num_epochs` `property` ¶

Return number of epochs in file.

Returns¶

int Total epoch count.

`num_satellites` `property` ¶

Return total number of unique satellites observed.

Returns¶

int Count of unique satellite vehicles across all systems.

`epochs` `property` ¶

Materialize all epochs (legacy compatibility).

Returns¶

list of Rnxv3ObsEpochRecord All epochs in memory (use iter_epochs for efficiency)

`str()` ¶

Return a human-readable summary.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def __str__(self) -> str:
    """Return a human-readable summary."""
    return (
        f"{self.__class__.__name__}:\n"
        f"  File Path: {self.fpath}\n"
        f"  Header: {self.header}\n"
        f"  Polarization: {self.polarization}\n"
    )

`repr()` ¶

Return a concise representation for debugging.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def __repr__(self) -> str:
    """Return a concise representation for debugging."""
    return f"{self.__class__.__name__}(fpath={self.fpath})"

`get_epoch_record_batches(epoch_record_indicator=EPOCH_RECORD_INDICATOR)` ¶

Get the start and end line numbers for each epoch in the file.

Parameters¶

epoch_record_indicator : str, default '>' Character marking epoch record lines.

Returns¶

list of tuple of int List of (start_line, end_line) pairs for each epoch.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def get_epoch_record_batches(
    self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
) -> list[tuple[int, int]]:
    """Get the start and end line numbers for each epoch in the file.

    Parameters
    ----------
    epoch_record_indicator : str, default '>'
        Character marking epoch record lines.

    Returns
    -------
    list of tuple of int
        List of (start_line, end_line) pairs for each epoch.

    """
    if self._cached_epoch_batches is not None:
        return self._cached_epoch_batches

    lines = self._load_file()
    starts = [
        i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
    ]
    starts.append(len(lines))  # Add EOF
    self._cached_epoch_batches = [
        (start, starts[i + 1])
        for i, start in enumerate(starts)
        if i + 1 < len(starts)
    ]
    return self._cached_epoch_batches

`parse_observation_slice(slice_text)` ¶

Parse a RINEX observation slice into value, LLI, and SSI.

Enhanced to handle both standard 16-character format and variable-length records.

Parameters¶

slice_text : str Observation slice to parse.

Returns¶

tuple[float | None, int | None, int | None] Parsed (value, LLI, SSI) tuple.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def parse_observation_slice(
    self,
    slice_text: str,
) -> tuple[float | None, int | None, int | None]:
    """Parse a RINEX observation slice into value, LLI, and SSI.

    Enhanced to handle both standard 16-character format and
    variable-length records.

    Parameters
    ----------
    slice_text : str
        Observation slice to parse.

    Returns
    -------
    tuple[float | None, int | None, int | None]
        Parsed (value, LLI, SSI) tuple.

    """
    if not slice_text or not slice_text.strip():
        return None, None, None

    try:
        # Method 1: Standard RINEX format with decimal at position -6
        if (
            len(slice_text) >= OBS_SLICE_MIN_LEN
            and len(slice_text) <= OBS_SLICE_MAX_LEN
            and slice_text[OBS_SLICE_DECIMAL_POS] == "."
        ):
            slice_chars = list(slice_text)
            ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
            lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

            # Convert LLI and SSI
            lli = int(lli) if lli.strip() and lli.isdigit() else None
            ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

            # Convert value
            value_str = "".join(slice_chars).strip()
            if value_str:
                value = float(value_str)
                return value, lli, ssi

    except ValueError, IndexError:
        pass

    try:
        # Method 2: Flexible parsing for variable-length records
        slice_trimmed = slice_text.strip()
        if not slice_trimmed:
            return None, None, None

        # Look for a decimal point to identify the numeric value
        if "." in slice_trimmed:
            # Find the main numeric value (supports negative numbers)
            number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

            if number_match:
                value = float(number_match.group(1))

                # Check for LLI/SSI indicators after the number
                remaining_part = slice_trimmed[number_match.end() :].strip()
                lli = None
                ssi = None

                # Parse remaining characters as potential LLI/SSI
                if remaining_part:
                    # Could be just SSI, or LLI followed by SSI
                    if len(remaining_part) == 1:
                        # Just one indicator - assume it's SSI
                        if remaining_part.isdigit():
                            ssi = int(remaining_part)
                    elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                        # Two or more characters - take last two as LLI, SSI
                        lli_char = remaining_part[-2]
                        ssi_char = remaining_part[-1]

                        if lli_char.isdigit():
                            lli = int(lli_char)
                        if ssi_char.isdigit():
                            ssi = int(ssi_char)

                return value, lli, ssi

    except ValueError, IndexError:
        pass

    # Method 3: Last resort - try simple float parsing
    try:
        simple_value = float(slice_text.strip())
        return simple_value, None, None
    except ValueError:
        pass

    return None, None, None

`process_satellite_data(s)` ¶

Process satellite data line into a Satellite object with observations.

Handles variable-length observation records correctly by adaptively parsing based on the actual line length and content.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def process_satellite_data(self, s: str) -> Satellite:
    """Process satellite data line into a Satellite object with observations.

    Handles variable-length observation records correctly by adaptively parsing
    based on the actual line length and content.
    """
    sv = s[:3].strip()
    satellite = Satellite(sv=sv)
    bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

    # Get the data part (after sv identifier)
    data_part = s[3:]

    # Process each observation adaptively
    for i, band in enumerate(bands_tbe):
        start_idx = i * 16
        end_idx = start_idx + 16

        # Check if we have enough data for this observation
        if start_idx >= len(data_part):
            # No more data available - create empty observation
            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=None,
                lli=None,
                ssi=None,
            )
            satellite.add_observation(observation)
            continue

        # Extract the slice, but handle variable length
        if end_idx <= len(data_part):
            # Full 16-character slice available
            slice_data = data_part[start_idx:end_idx]
        else:
            # Partial slice - pad with spaces to maintain consistency
            available_slice = data_part[start_idx:]
            slice_data = available_slice.ljust(16)  # Pad with spaces if needed

        value, lli, ssi = self.parse_observation_slice(slice_data)

        observation = Observation(
            obs_type=band.split("|")[1][0],
            value=value,
            lli=lli,
            ssi=ssi,
        )
        satellite.add_observation(observation)

    return satellite

`iter_epochs()` ¶

Yield epochs one by one instead of materializing the whole list.

Returns¶

Generator Generator yielding Rnxv3ObsEpochRecord objects

Yields¶

Rnxv3ObsEpochRecord Each epoch with timestamp and satellite observations

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
    """Yield epochs one by one instead of materializing the whole list.

    Returns
    -------
    Generator
        Generator yielding Rnxv3ObsEpochRecord objects

    Yields
    ------
    Rnxv3ObsEpochRecord
        Each epoch with timestamp and satellite observations

    """
    for start, end in self.get_epoch_record_batches():
        try:
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )

            # Skip event epochs (flag 2-6: special records, not observations)
            if info.epoch_flag > 1:
                continue

            # Filter out blank/whitespace-only lines from data slice
            data = [line for line in self._lines[start + 1 : end] if line.strip()]
            epoch = Rnxv3ObsEpochRecord(
                info=info,
                data=[self.process_satellite_data(line) for line in data],
            )
            yield epoch
        except InvalidEpochError, IncompleteEpochError, ValueError:
            # Skip epochs with validation errors (invalid SV, malformed data,
            # pydantic ValidationError inherits from ValueError)
            pass

`iter_epochs_in_range(start, end)` ¶

Yield epochs lazily that fall into the given datetime range.

Parameters¶

start : datetime Start of time range (inclusive) end : datetime End of time range (inclusive)

Returns¶

Generator Generator yielding epochs in the specified range

Yields¶

Rnxv3ObsEpochRecord Epochs within the time range

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def iter_epochs_in_range(
    self,
    start: datetime,
    end: datetime,
) -> Iterable[Rnxv3ObsEpochRecord]:
    """Yield epochs lazily that fall into the given datetime range.

    Parameters
    ----------
    start : datetime
        Start of time range (inclusive)
    end : datetime
        End of time range (inclusive)

    Returns
    -------
    Generator
        Generator yielding epochs in the specified range

    Yields
    ------
    Rnxv3ObsEpochRecord
        Epochs within the time range

    """
    for epoch in self.iter_epochs():
        dt = self.get_datetime_from_epoch_record_info(epoch.info)
        if start <= dt <= end:
            yield epoch

`get_datetime_from_epoch_record_info(epoch_record_info)` ¶

Convert epoch record info to datetime object.

Parameters¶

epoch_record_info : Rnxv3ObsEpochRecordLineModel Parsed epoch record line

Returns¶

datetime Timestamp from epoch record

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def get_datetime_from_epoch_record_info(
    self,
    epoch_record_info: Rnxv3ObsEpochRecordLineModel,
) -> datetime:
    """Convert epoch record info to datetime object.

    Parameters
    ----------
    epoch_record_info : Rnxv3ObsEpochRecordLineModel
        Parsed epoch record line

    Returns
    -------
    datetime
        Timestamp from epoch record

    """
    return datetime(
        year=int(epoch_record_info.year),
        month=int(epoch_record_info.month),
        day=int(epoch_record_info.day),
        hour=int(epoch_record_info.hour),
        minute=int(epoch_record_info.minute),
        second=int(epoch_record_info.seconds),
        tzinfo=UTC,
    )

`epochrecordinfo_dt_to_numpy_dt(epch)` `staticmethod` ¶

Convert Python datetime to numpy datetime64[ns].

Parameters¶

epch : Rnxv3ObsEpochRecord Epoch record containing timestamp info

Returns¶

np.datetime64 Numpy datetime64 with nanosecond precision

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

@staticmethod
def epochrecordinfo_dt_to_numpy_dt(
    epch: Rnxv3ObsEpochRecord,
) -> np.datetime64:
    """Convert Python datetime to numpy datetime64[ns].

    Parameters
    ----------
    epch : Rnxv3ObsEpochRecord
        Epoch record containing timestamp info

    Returns
    -------
    np.datetime64
        Numpy datetime64 with nanosecond precision

    """
    dt = datetime(
        year=int(epch.info.year),
        month=int(epch.info.month),
        day=int(epch.info.day),
        hour=int(epch.info.hour),
        minute=int(epch.info.minute),
        second=int(epch.info.seconds),
        tzinfo=UTC,
    )
    # np.datetime64 doesn't support timezone info, but datetime is already UTC
    # Convert to naive datetime (UTC) to avoid warning
    return np.datetime64(dt.replace(tzinfo=None), "ns")

`infer_sampling_interval()` ¶

Infer sampling interval from consecutive epoch deltas.

Returns¶

pint.Quantity or None Sampling interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def infer_sampling_interval(self) -> pint.Quantity | None:
    """Infer sampling interval from consecutive epoch deltas.

    Returns
    -------
    pint.Quantity or None
        Sampling interval in seconds, or None if cannot be inferred

    """
    dts = self._epoch_datetimes()
    if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
        return None
    # Compute deltas
    deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
    if not deltas:
        return None
    # Pick the most common delta (robust to an occasional missing epoch)
    seconds = Counter(
        int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
    )
    if not seconds:
        return None
    mode_seconds, _ = seconds.most_common(1)[0]
    return (mode_seconds * UREG.second).to(UREG.seconds)

`infer_dump_interval(sampling_interval=None)` ¶

Infer the intended dump interval for the RINEX file.

Parameters¶

sampling_interval : pint.Quantity, optional Known sampling interval. If provided, returns (#epochs * sampling_interval)

Returns¶

pint.Quantity or None Dump interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def infer_dump_interval(
    self, sampling_interval: pint.Quantity | None = None
) -> pint.Quantity | None:
    """Infer the intended dump interval for the RINEX file.

    Parameters
    ----------
    sampling_interval : pint.Quantity, optional
        Known sampling interval. If provided, returns (#epochs * sampling_interval)

    Returns
    -------
    pint.Quantity or None
        Dump interval in seconds, or None if cannot be inferred

    """
    idx = self.get_epoch_record_batches()
    n_epochs = len(idx)
    if n_epochs == 0:
        return None

    if sampling_interval is not None:
        return (n_epochs * sampling_interval).to(UREG.seconds)

    # Fallback: time coverage inclusive (last - first) + typical step
    dts = self._epoch_datetimes()
    if len(dts) == 0:
        return None
    if len(dts) == 1:
        # single epoch: treat as 1 * unknown step (cannot infer)
        return None

    # Estimate step from data
    est_step = self.infer_sampling_interval()
    if est_step is None:
        return None

    # Inclusive coverage often equals (n_epochs - 1) * step; intended
    # dump interval is n_epochs * step.
    return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

`validate_epoch_completeness(dump_interval=None, sampling_interval=None)` ¶

Validate that the number of epochs matches the expected dump interval.

Parameters¶

dump_interval : str or pint.Quantity, optional Expected file dump interval. If None, inferred from epochs. sampling_interval : str or pint.Quantity, optional Expected sampling interval. If None, inferred from epochs.

Returns¶

None

Raises¶

MissingEpochError If total sampling time doesn't match dump interval ValueError If intervals cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def validate_epoch_completeness(
    self,
    dump_interval: str | pint.Quantity | None = None,
    sampling_interval: str | pint.Quantity | None = None,
) -> None:
    """Validate that the number of epochs matches the expected dump interval.

    Parameters
    ----------
    dump_interval : str or pint.Quantity, optional
        Expected file dump interval. If None, inferred from epochs.
    sampling_interval : str or pint.Quantity, optional
        Expected sampling interval. If None, inferred from epochs.

    Returns
    -------
    None

    Raises
    ------
    MissingEpochError
        If total sampling time doesn't match dump interval
    ValueError
        If intervals cannot be inferred

    """
    # Normalize/Infer sampling interval
    if sampling_interval is None:
        inferred = self.infer_sampling_interval()
        if inferred is None:
            msg = "Could not infer sampling interval from epochs"
            raise ValueError(msg)
        sampling_interval = inferred
    # normalize to pint
    elif not isinstance(sampling_interval, pint.Quantity):
        sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

    # Normalize/Infer dump interval
    if dump_interval is None:
        inferred_dump = self.infer_dump_interval(
            sampling_interval=sampling_interval
        )
        if inferred_dump is None:
            msg = "Could not infer dump interval from file"
            raise ValueError(msg)
        dump_interval = inferred_dump
    elif not isinstance(dump_interval, pint.Quantity):
        # Accept '15 min', '1h', etc.
        dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

    # Build inputs for the validator model
    epoch_indices = self.get_epoch_record_batches()

    # This throws MissingEpochError automatically if inconsistent
    cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
        epoch_records_indeces=epoch_indices,
        rnx_file_dump_interval=dump_interval,
        sampling_interval=sampling_interval,
    )

`filter_by_overlapping_groups(ds, group_preference=None)` ¶

Filter overlapping bands using per-group preferences.

Parameters¶

ds : xr.Dataset Dataset with sid dimension and signal properties. group_preference : dict[str, str], optional Mapping of overlap group to preferred band.

Returns¶

xr.Dataset Dataset filtered to preferred overlapping bands.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def filter_by_overlapping_groups(
    self,
    ds: xr.Dataset,
    group_preference: dict[str, str] | None = None,
) -> xr.Dataset:
    """Filter overlapping bands using per-group preferences.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset with `sid` dimension and signal properties.
    group_preference : dict[str, str], optional
        Mapping of overlap group to preferred band.

    Returns
    -------
    xr.Dataset
        Dataset filtered to preferred overlapping bands.

    """
    if group_preference is None:
        group_preference = {
            "L1_E1_B1I": "L1",
            "L5_E5a": "L5",
            "L2_E5b_B2b": "L2",
        }

    keep = []
    for sid in ds.sid.values:
        parts = str(sid).split("|")
        band = parts[1] if len(parts) >= 2 else ""
        group = self._signal_mapper.get_overlapping_group(band)
        if group and group in group_preference:
            if band == group_preference[group]:
                keep.append(sid)
        else:
            keep.append(sid)
    return ds.sel(sid=keep)

`create_rinex_netcdf_with_signal_id(start=None, end=None)` ¶

Create a NetCDF dataset with signal IDs.

Always uses the fast single-pass path. Optionally restricts to epochs within a datetime range via post-filtering.

Parameters¶

start : datetime, optional Start of time range (inclusive). end : datetime, optional End of time range (inclusive).

Returns¶

xr.Dataset Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def create_rinex_netcdf_with_signal_id(
    self,
    start: datetime | None = None,
    end: datetime | None = None,
) -> xr.Dataset:
    """Create a NetCDF dataset with signal IDs.

    Always uses the fast single-pass path.  Optionally restricts to
    epochs within a datetime range via post-filtering.

    Parameters
    ----------
    start : datetime, optional
        Start of time range (inclusive).
    end : datetime, optional
        End of time range (inclusive).

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid).

    """
    ds = self._create_dataset_single_pass()

    if start or end:
        ds = ds.sel(epoch=slice(start, end))

    return ds

`to_ds(keep_data_vars=None, **kwargs)` ¶

Convert RINEX observations to xarray.Dataset with signal ID structure.

Parameters¶

outname : Path or str, optional If provided, saves dataset to this file path keep_data_vars : list of str or None, optional Data variables to include in dataset. Defaults to config value. write_global_attrs : bool, default False If True, adds comprehensive global attributes pad_global_sid : bool, default True If True, pads to global signal ID space strip_fillval : bool, default True If True, removes fill values add_future_datavars : bool, default True If True, adds placeholder variables for future data keep_sids : list of str or None, default None If provided, filters/pads dataset to these specific SIDs. If None and pad_global_sid=True, pads to all possible SIDs.

Returns¶

xr.Dataset Dataset with dimensions (epoch, sid) and requested data variables

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert RINEX observations to xarray.Dataset with signal ID structure.

    Parameters
    ----------
    outname : Path or str, optional
        If provided, saves dataset to this file path
    keep_data_vars : list of str or None, optional
        Data variables to include in dataset. Defaults to config value.
    write_global_attrs : bool, default False
        If True, adds comprehensive global attributes
    pad_global_sid : bool, default True
        If True, pads to global signal ID space
    strip_fillval : bool, default True
        If True, removes fill values
    add_future_datavars : bool, default True
        If True, adds placeholder variables for future data
    keep_sids : list of str or None, default None
        If provided, filters/pads dataset to these specific SIDs.
        If None and pad_global_sid=True, pads to all possible SIDs.

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid) and requested data variables

    """
    outname = cast(Path | str | None, kwargs.pop("outname", None))
    write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
    pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
    strip_fillval = bool(kwargs.pop("strip_fillval", True))
    add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
    keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

    if keep_data_vars is None:
        from canvod.utils.config import load_config

        keep_data_vars = load_config().processing.processing.keep_rnx_vars

    ds = self.create_rinex_netcdf_with_signal_id()

    # drop unwanted vars
    for var in list(ds.data_vars):
        if var not in keep_data_vars:
            ds = ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        # Pad/filter to specified sids or all possible sids
        ds = pad_to_global_sid(ds, keep_sids=keep_sids)

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        ds = strip_fillvalue(ds)

    if add_future_datavars:
        pass

    if write_global_attrs:
        ds.attrs.update(self._create_comprehensive_attrs())

    ds.attrs.update(self._build_attrs())

    if outname:
        from canvod.utils.config import load_config as _load_config

        comp = _load_config().processing.compression
        encoding = {
            var: {"zlib": comp.zlib, "complevel": comp.complevel}
            for var in ds.data_vars
        }
        ds.to_netcdf(str(outname), encoding=encoding)

    # Validate output structure for pipeline compatibility
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

`validate_rinex_304_compliance(ds=None, strict=False, print_report=True)` ¶

Run enhanced RINEX 3.04 specification validation.

Validates: 1. System-specific observation codes 2. GLONASS mandatory fields (slot/frequency, biases) 3. Phase shift records (RINEX 3.01+) 4. Observation value ranges

Parameters¶

ds : xr.Dataset, optional Dataset to validate. If None, creates one from current file. strict : bool If True, raise ValueError on validation failures print_report : bool If True, print validation report to console

Returns¶

dict[str, list[str]] Validation results by category

Examples¶

reader = Rnxv3Obs(fpath="station.24o") results = reader.validate_rinex_304_compliance()

Or validate a specific dataset¶

ds = reader.to_ds() results = reader.validate_rinex_304_compliance(ds=ds)

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def validate_rinex_304_compliance(
    self,
    ds: xr.Dataset | None = None,
    strict: bool = False,
    print_report: bool = True,
) -> dict[str, list[str]]:
    """Run enhanced RINEX 3.04 specification validation.

    Validates:
    1. System-specific observation codes
    2. GLONASS mandatory fields (slot/frequency, biases)
    3. Phase shift records (RINEX 3.01+)
    4. Observation value ranges

    Parameters
    ----------
    ds : xr.Dataset, optional
        Dataset to validate. If None, creates one from current file.
    strict : bool
        If True, raise ValueError on validation failures
    print_report : bool
        If True, print validation report to console

    Returns
    -------
    dict[str, list[str]]
        Validation results by category

    Examples
    --------
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> results = reader.validate_rinex_304_compliance()
    >>> # Or validate a specific dataset
    >>> ds = reader.to_ds()
    >>> results = reader.validate_rinex_304_compliance(ds=ds)

    """
    if ds is None:
        ds = self.to_ds(write_global_attrs=False)

    # Prepare header dict for validators
    header_dict: dict[str, Any] = {
        "obs_codes_per_system": self.header.obs_codes_per_system,
    }

    # Add GLONASS-specific headers if available
    if hasattr(self.header, "glonass_slot_frq"):
        header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

    if hasattr(self.header, "glonass_cod_phs_bis"):
        header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

    if hasattr(self.header, "phase_shift"):
        header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

    # Run validation
    results = RINEX304ComplianceValidator.validate_all(
        ds=ds, header_dict=header_dict, strict=strict
    )

    if print_report:
        RINEX304ComplianceValidator.print_validation_report(results)

    return results

`SbfReader` ¶

Bases: GNSSDataReader

Read and decode a Septentrio Binary Format (SBF) observation file.

Parameters¶

fpath : Path Path to the *.sbf (or *.SBF, or receiver-named) binary file.

Examples¶

reader = SbfReader(fpath=Path("rref213a00.25_")) print(reader.header.rx_version) 4.14.4 for epoch in reader.iter_epochs(): ... for obs in epoch.observations: ... print(obs.system, obs.prn, obs.cn0)

Notes¶

All physical-unit conversions follow RefGuide-4.14.0.
Physical quantities are expressed as :class:pint.Quantity objects using the shared :data:~canvod.readers.gnss_specs.constants.UREG.
GLONASS FDMA frequencies are resolved from the most recently seen ChannelStatus block; observations before the first ChannelStatus for a given SVID have phase_cycles=None.
The file is scanned once per :meth:iter_epochs call; use :attr:num_epochs for a pre-computed count (scans once on first access).
Inherits fpath, its validator, and arbitrary_types_allowed from :class:GNSSDataReader.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py

class SbfReader(GNSSDataReader):
    """Read and decode a Septentrio Binary Format (SBF) observation file.

    Parameters
    ----------
    fpath : Path
        Path to the ``*.sbf`` (or ``*.SBF``, or receiver-named) binary file.

    Examples
    --------
    >>> reader = SbfReader(fpath=Path("rref213a00.25_"))
    >>> print(reader.header.rx_version)
    4.14.4
    >>> for epoch in reader.iter_epochs():
    ...     for obs in epoch.observations:
    ...         print(obs.system, obs.prn, obs.cn0)

    Notes
    -----
    - All physical-unit conversions follow RefGuide-4.14.0.
    - Physical quantities are expressed as :class:`pint.Quantity` objects
      using the shared :data:`~canvod.readers.gnss_specs.constants.UREG`.
    - GLONASS FDMA frequencies are resolved from the most recently seen
      ChannelStatus block; observations before the first ChannelStatus for a
      given SVID have ``phase_cycles=None``.
    - The file is scanned once per :meth:`iter_epochs` call; use
      :attr:`num_epochs` for a pre-computed count (scans once on first access).
    - Inherits ``fpath``, its validator, and ``arbitrary_types_allowed``
      from :class:`GNSSDataReader`.
    """

    model_config = ConfigDict(extra="ignore")

    @property
    def source_format(self) -> str:
        return "sbf"

    # ------------------------------------------------------------------
    # Pre-scan caches
    # ------------------------------------------------------------------

    @cached_property
    def _freq_nr_cache(self) -> dict[int, int]:
        """Pre-scan ALL ChannelStatus blocks to build a complete SVID → FreqNr map.

        Scanning the entire file once means early GLONASS epochs also have
        accurate FDMA frequency assignments in :meth:`iter_epochs`.

        Returns
        -------
        dict of {int: int}
            Mapping from Septentrio SVID to GLONASS frequency slot number.
        """
        parser = sbf_parser.SbfParser()
        cache: dict[int, int] = {}
        for name, data in parser.read(str(self.fpath)):
            if name == "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid = int(sat["SVID"])
                    if svid != 0:
                        cache[svid] = int(sat["FreqNr"])
        return cache

    # ------------------------------------------------------------------
    # GNSSDataReader abstract property implementations
    # ------------------------------------------------------------------

    @cached_property
    def file_hash(self) -> str:
        """SHA-256 hex digest of the file (first 16 characters).

        Returns
        -------
        str
            16-character hexadecimal prefix of the SHA-256 hash.
        """
        h = hashlib.sha256(self.fpath.read_bytes())
        return h.hexdigest()[:16]

    @cached_property
    def start_time(self) -> datetime:
        """Return the timestamp of the first decoded epoch.

        Returns
        -------
        datetime
            Timezone-aware UTC datetime of the first observation epoch.

        Raises
        ------
        LookupError
            If the file contains no decodable epochs.
        """
        for epoch in self.iter_epochs():
            return epoch.timestamp
        raise LookupError(f"No epochs in {self.fpath}")

    @cached_property
    def end_time(self) -> datetime:
        """Return the timestamp of the last decoded epoch.

        Returns
        -------
        datetime
            Timezone-aware UTC datetime of the last observation epoch.

        Raises
        ------
        LookupError
            If the file contains no decodable epochs.
        """
        last: datetime | None = None
        for epoch in self.iter_epochs():
            last = epoch.timestamp
        if last is None:
            raise LookupError(f"No epochs in {self.fpath}")
        return last

    @cached_property
    def systems(self) -> list[str]:
        """Return sorted list of GNSS system codes present in the file.

        Returns
        -------
        list of str
            Sorted list of RINEX system letters (e.g. ``["E", "G", "R"]``).
        """
        return sorted(
            {obs.system for ep in self.iter_epochs() for obs in ep.observations}
        )

    @cached_property
    def num_satellites(self) -> int:
        """Return the number of unique satellites observed in the file.

        Returns
        -------
        int
            Count of unique ``system + PRN`` pairs across all epochs.
        """
        return len(
            {
                f"{obs.system}{obs.prn:02d}"
                for ep in self.iter_epochs()
                for obs in ep.observations
            }
        )

    # ------------------------------------------------------------------
    # Epoch count (existing cached property — kept for backward compat)
    # ------------------------------------------------------------------

    @cached_property
    def num_epochs(self) -> int:
        """Count the number of MeasEpoch blocks in the file.

        Returns
        -------
        int
            Total MeasEpoch block count (one per observation epoch).

        Notes
        -----
        Scans the entire file once; result is cached.
        """
        parser = sbf_parser.SbfParser()
        count = sum(
            1 for name, _ in parser.read(str(self.fpath)) if name == "MeasEpoch"
        )
        log.debug("sbf_epoch_count", fpath=str(self.fpath), num_epochs=count)
        return count

    # ------------------------------------------------------------------
    # Header
    # ------------------------------------------------------------------

    @cached_property
    def header(self) -> SbfHeader:
        """Parse the first ReceiverSetup block in the file.

        Returns
        -------
        SbfHeader
            Receiver metadata.

        Raises
        ------
        LookupError
            If no ReceiverSetup block is found.
        """
        parser = sbf_parser.SbfParser()
        for name, data in parser.read(str(self.fpath)):
            if name == "ReceiverSetup":
                return SbfHeader(
                    marker_name=_decode_bytes(data["MarkerName"]),
                    marker_number=_decode_bytes(data["MarkerNumber"]),
                    observer=_decode_bytes(data["Observer"]),
                    agency=_decode_bytes(data["Agency"]),
                    rx_serial=_decode_bytes(data["RxSerialNumber"]),
                    rx_name=_decode_bytes(data["RxName"]),
                    rx_version=_decode_bytes(data["RxVersion"]),
                    ant_serial=_decode_bytes(data["AntSerialNbr"]),
                    ant_type=_decode_bytes(data["AntType"]),
                    delta_h=float(data["deltaH"]) * UREG.meter,
                    delta_e=float(data["deltaE"]) * UREG.meter,
                    delta_n=float(data["deltaN"]) * UREG.meter,
                    latitude_rad=float(data["Latitude"]),
                    longitude_rad=float(data["Longitude"]),
                    height_m=float(data["Height"]) * UREG.meter,
                    gnss_fw_version=_decode_bytes(data["GNSSFWVersion"]),
                    product_name=_decode_bytes(data["ProductName"]),
                )
        raise LookupError(f"No ReceiverSetup block found in {self.fpath}")

    # ------------------------------------------------------------------
    # Epoch iterator
    # ------------------------------------------------------------------

    def iter_epochs(self) -> Iterator[SbfEpoch]:
        """Iterate over decoded MeasEpoch blocks.

        Yields decoded :class:`SbfEpoch` objects with all signal observations
        converted to physical units as :class:`pint.Quantity`.

        Yields
        ------
        SbfEpoch
            One decoded observation epoch.

        Notes
        -----
        - The file is scanned from start to finish on each call.
        - The :attr:`_freq_nr_cache` is pre-populated from ALL ChannelStatus
          blocks before the first call, so all GLONASS FDMA epochs have
          accurate carrier frequencies.
        - ``delta_ls`` (leap seconds) is taken from the most recent
          ReceiverTime block; defaults to 18 if none has been seen yet.
        """
        parser = sbf_parser.SbfParser()
        freq_nr_cache: dict[int, int] = self._freq_nr_cache.copy()
        delta_ls: int = _DEFAULT_DELTA_LS

        for name, data in parser.read(str(self.fpath)):
            match name:
                case "ReceiverTime":
                    delta_ls = int(data["DeltaLS"])

                case "ChannelStatus":
                    for sat in data.get("ChannelSatInfo", []):
                        svid = int(sat["SVID"])
                        if svid != 0:
                            freq_nr_cache[svid] = int(sat["FreqNr"])

                case "MeasEpoch":
                    epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                    if epoch is not None:
                        yield epoch

    # ------------------------------------------------------------------
    # Dataset construction — observations
    # ------------------------------------------------------------------

    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        pad_global_sid: bool = True,
        strip_fillval: bool = True,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert SBF observations to an ``(epoch, sid)`` xarray Dataset.

        Produces the same structure as :class:`~canvod.readers.rinex.v3_04.Rnxv3Obs`
        and passes :func:`~canvod.readers.base.validate_dataset`.

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to retain.  If ``None``, all five variables are
            kept: ``SNR``, ``Pseudorange``, ``Phase``, ``Doppler``, ``SSI``.
            Note: ``LLI`` is not produced — SBF has no loss-of-lock indicator.
        pad_global_sid : bool, default True
            If ``True``, pads the dataset to the global SID space via
            :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.
        strip_fillval : bool, default True
            If ``True``, removes fill values via
            :func:`canvod.auxiliary.preprocessing.strip_fillvalue`.
        **kwargs
            Ignored (for ABC compatibility).

        Returns
        -------
        xr.Dataset
            Dataset with dimensions ``(epoch, sid)`` that passes
            :func:`~canvod.readers.base.validate_dataset`.
        """
        import math

        freq_nr_cache = self._freq_nr_cache.copy()

        # --- Single pass: collect timestamps, SID properties, and per-epoch obs ---
        # Stores per-epoch obs as dicts (SID → value) so we only scan the file once.
        # Array construction happens afterwards in fast in-memory loops.
        sid_props: dict[str, dict[str, Any]] = {}
        timestamps: list[np.datetime64] = []
        # Per-epoch accumulator: list of (snr_dict, pr_dict, ph_dict, dop_dict)
        epoch_rows: list[
            tuple[
                dict[str, float], dict[str, float], dict[str, float], dict[str, float]
            ]
        ] = []

        for epoch in self.iter_epochs():
            ts_np = np.datetime64(epoch.timestamp.replace(tzinfo=None), "ns")
            timestamps.append(ts_np)

            e_snr: dict[str, float] = {}
            e_pr: dict[str, float] = {}
            e_ph: dict[str, float] = {}
            e_dop: dict[str, float] = {}

            for obs in epoch.observations:
                props = _sid_props_from_obs(obs.svid, obs.signal_num, freq_nr_cache)
                if props is None:
                    continue
                sid = props["sid"]
                if sid not in sid_props:
                    sid_props[sid] = props
                if obs.cn0 is not None:
                    e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
                if obs.pseudorange is not None:
                    e_pr[sid] = float(obs.pseudorange.to(UREG.meter).magnitude)
                if obs.phase_cycles is not None:
                    e_ph[sid] = obs.phase_cycles
                if obs.doppler is not None:
                    e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)

            epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

        sorted_sids = sorted(sid_props)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(timestamps)
        n_sids = len(sorted_sids)

        # Allocate arrays (LLI is dropped — SBF has no loss-of-lock indicator)
        snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
            for sid, val in e_snr.items():
                snr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_pr.items():
                pr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_ph.items():
                ph_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_dop.items():
                dop_arr[t_idx, sid_to_idx[sid]] = val

        # Build coordinate arrays
        freq_center = np.asarray(
            [sid_props[s]["freq_center"] for s in sorted_sids],
            dtype=DTYPES["freq_center"],
        )
        freq_min = np.asarray(
            [sid_props[s]["freq_min"] for s in sorted_sids], dtype=DTYPES["freq_min"]
        )
        freq_max = np.asarray(
            [sid_props[s]["freq_max"] for s in sorted_sids], dtype=DTYPES["freq_max"]
        )

        coords: dict[str, Any] = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": (
                "sid",
                np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        attrs = cast(dict[str, Any], self._build_attrs())

        # Add ECEF position from ReceiverSetup header for pipeline compatibility.
        # ECEFPosition.from_ds_metadata() reads "APPROX POSITION X/Y/Z".
        try:
            import pymap3d as pm

            hdr = self.header
            lat_deg = math.degrees(hdr.latitude_rad)
            lon_deg = math.degrees(hdr.longitude_rad)
            h_m = float(hdr.height_m.to(UREG.meter).magnitude)
            x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
            attrs["APPROX POSITION X"] = float(x)
            attrs["APPROX POSITION Y"] = float(y)
            attrs["APPROX POSITION Z"] = float(z)
        except LookupError, AttributeError:
            pass  # SBF file without a ReceiverSetup block

        ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pr_arr,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
                "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
                "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
            },
            coords=coords,
            attrs=attrs,
        )

        # Post-process
        if keep_data_vars is not None:
            for var in list(ds.data_vars):
                if var not in keep_data_vars:
                    ds = ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            ds = pad_to_global_sid(
                ds,
                keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
            )

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            ds = strip_fillvalue(ds)

        validate_dataset(ds, required_vars=keep_data_vars)
        return ds

    # ------------------------------------------------------------------
    # Dataset construction — metadata
    # ------------------------------------------------------------------

    def to_metadata_ds(
        self, pad_global_sid: bool = True, **kwargs: object
    ) -> xr.Dataset:
        """Decode SBF metadata blocks to an ``(epoch, sid)`` xarray Dataset.

        Decodes PVTGeodetic, DOP, ReceiverStatus, SatVisibility, and
        MeasExtra blocks in a single file scan.

        Parameters
        ----------
        pad_global_sid : bool, default True
            If ``True``, pads to the global SID space via
            :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions ``(epoch, sid)``.  Epoch-level scalars
            (PDOP, NrSV, …) are 1-D ``(epoch,)`` coordinates.  Satellite
            geometry (theta, phi) and signal quality (MPCorrection, …) are
            ``(epoch, sid)`` data variables.
        """
        parser = sbf_parser.SbfParser()
        freq_nr_cache = self._freq_nr_cache.copy()

        pending: dict[str, Any] = {
            "pvt": None,
            "dop": None,
            "status": None,
            "satvis": [],
            "extra": [],
        }

        # Each record: (ts, pvt, dop, status, satvis, extra, obs_map)
        records: list[tuple[Any, ...]] = []

        # sid discovery — same logic as to_ds() pass 1
        sid_props: dict[str, dict[str, Any]] = {}

        delta_ls: int = _DEFAULT_DELTA_LS

        for name, data in parser.read(str(self.fpath)):
            match name:
                case "ReceiverTime":
                    delta_ls = int(data["DeltaLS"])

                case "ChannelStatus":
                    for sat in data.get("ChannelSatInfo", []):
                        svid_cs = int(sat["SVID"])
                        if svid_cs != 0:
                            freq_nr_cache[svid_cs] = int(sat["FreqNr"])

                case "PVTGeodetic":
                    pending["pvt"] = data

                case "DOP":
                    pending["dop"] = data

                case "ReceiverStatus":
                    pending["status"] = data

                case "SatVisibility":
                    pending["satvis"] = list(data.get("SatInfo", []))

                case "MeasExtra":
                    pending["extra"] = list(data.get("MeasExtraChannel", []))

                case "MeasEpoch":
                    tow_ms = int(data["TOW"])
                    wn = int(data["WNc"])
                    ts = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                    obs_map = _build_obs_map(data)

                    # Discover sids from Type1 and Type2 sub-blocks
                    for t1 in data.get("Type_1", []):
                        svid1 = int(t1["SVID"])
                        props1 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if props1 is not None and props1["sid"] not in sid_props:
                            sid_props[props1["sid"]] = props1

                        for t2 in t1.get("Type_2", []):
                            props2 = _sid_props_from_obs(
                                svid1,
                                decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                                freq_nr_cache,
                            )
                            if props2 is not None and props2["sid"] not in sid_props:
                                sid_props[props2["sid"]] = props2

                    records.append(
                        (
                            ts,
                            pending["pvt"],
                            pending["dop"],
                            pending["status"],
                            list(pending["satvis"]),
                            list(pending["extra"]),
                            obs_map,
                        )
                    )
                    pending = {
                        "pvt": None,
                        "dop": None,
                        "status": None,
                        "satvis": [],
                        "extra": [],
                    }

        # Build index structures
        sorted_sids = sorted(sid_props)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(records)
        n_sids = len(sorted_sids)

        # sv → list of sid indices (for SatVisibility broadcasting)
        sids_for_sv: dict[str, list[int]] = {}
        for sid in sorted_sids:
            sv = sid_props[sid]["sv"]
            sids_for_sv.setdefault(sv, []).append(sid_to_idx[sid])

        # (epoch, sid) data variable arrays
        theta_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        phi_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        rise_set_arr = np.full((n_epochs, n_sids), -1, dtype=np.int8)
        mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        smoothing_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        code_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        carr_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        lock_time_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        cum_loss_cont_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        car_mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        cn0_highres_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)

        # (epoch,) scalar coordinate arrays
        pdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        hdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        vdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        n_sv_arr = np.full(n_epochs, -1, dtype=np.int16)
        h_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        v_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        pvt_mode_arr = np.full(n_epochs, -1, dtype=np.int8)
        mean_corr_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        cpu_load_arr = np.full(n_epochs, -1, dtype=np.int8)
        temp_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        rx_error_arr = np.full(n_epochs, 0, dtype=np.int32)

        timestamps: list[np.datetime64] = []

        # Fill arrays from records
        for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
            timestamps.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

            # DOP block → pdop, hdop, vdop
            if dop is not None:
                try:
                    pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            # PVTGeodetic → n_sv, accuracy, mode, correction age
            if pvt is not None:
                try:
                    n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                    raw_hacc = int(pvt["HAccuracy"])
                    if raw_hacc != 65535:
                        h_acc_arr[t_idx] = raw_hacc * 0.01
                    raw_vacc = int(pvt["VAccuracy"])
                    if raw_vacc != 65535:
                        v_acc_arr[t_idx] = raw_vacc * 0.01
                    pvt_mode_arr[t_idx] = int(pvt["Mode"])
                    mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                    # Also pick up DOP from PVTGeodetic if DOP block absent
                    if np.isnan(pdop_arr[t_idx]):
                        pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                        hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                        vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            # ReceiverStatus → cpu_load, temperature, rx_error
            if status is not None:
                try:
                    cpu_load_arr[t_idx] = int(status["CPULoad"])
                    raw_temp = int(status["Temperature"])
                    if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                        temp_arr[t_idx] = float(raw_temp - 100)
                    rx_error_arr[t_idx] = int(status["RxError"])
                except KeyError, TypeError, ValueError:
                    pass

            # SatVisibility → broadcast theta/phi to all sids for that sv
            for sat_info in satvis:
                try:
                    svid_raw = int(sat_info["SVID"])
                    sys_code, prn = decode_svid(svid_raw)
                    sv = f"{sys_code}{prn:02d}"
                    theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                    phi_deg = int(sat_info["Azimuth"]) * 0.01
                    rs = int(sat_info["RiseSet"])
                    for s_idx in sids_for_sv.get(sv, []):
                        theta_arr[t_idx, s_idx] = theta_deg
                        phi_arr[t_idx, s_idx] = phi_deg
                        rise_set_arr[t_idx, s_idx] = rs
                except KeyError, TypeError, ValueError:
                    pass

            # MeasExtra → per-(epoch, sid) signal quality
            for ch in extra:
                try:
                    type_byte = int(ch["Type"])
                    info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                    sig_num = decode_signal_num(type_byte, info_byte)
                    rx_ch = int(ch["RxChannel"])
                    svid = obs_map.get((rx_ch, sig_num))
                    if svid is None:
                        continue
                    sig_def = SIGNAL_TABLE.get(sig_num)
                    if sig_def is None:
                        continue
                    sys_code2, prn2 = decode_svid(svid)
                    sv2 = f"{sys_code2}{prn2:02d}"
                    sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                    s_idx = sid_to_idx.get(sid)
                    if s_idx is None:
                        continue
                    mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                    mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                    # SmoothingCorr: i2, scale 0.001 m/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    raw_sc = ch.get("SmoothingCorr")
                    if raw_sc is not None:
                        smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                    raw_cv = ch.get("CodeVar")
                    # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cv is not None and int(raw_cv) != 65535:
                        code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                    raw_rv = ch.get("CarrierVar")
                    # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_rv is not None and int(raw_rv) != 65535:
                        carr_var_arr[t_idx, s_idx] = float(raw_rv)
                    raw_lt = ch.get("LockTime")
                    # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_lt is not None and int(raw_lt) != 65535:
                        lock_time_arr[t_idx, s_idx] = float(raw_lt)
                    raw_clc = ch.get("CumLossCont")
                    # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_clc is not None:
                        cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                    raw_cmc = ch.get("CarMPCorr")
                    # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cmc is not None:
                        car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                    raw_misc = ch.get("Misc")
                    # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_misc is not None:
                        cn0_hr = int(raw_misc) & 0x07
                        cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
                except KeyError, TypeError, ValueError:
                    pass

        # Build Dataset
        freq_center = np.asarray(
            [sid_props[s]["freq_center"] for s in sorted_sids], dtype=np.float32
        )
        freq_min = np.asarray(
            [sid_props[s]["freq_min"] for s in sorted_sids], dtype=np.float32
        )
        freq_max = np.asarray(
            [sid_props[s]["freq_max"] for s in sorted_sids], dtype=np.float32
        )

        coords: dict[str, Any] = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": (
                "sid",
                np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
            # Epoch-level scalars (1-D over epoch)
            "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
            "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
            "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
            "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
            "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
            "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
            "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
            "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
            "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
            "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
            "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
        }

        attrs = self._build_attrs()

        ds = xr.Dataset(
            data_vars={
                "broadcast_theta": (
                    ["epoch", "sid"],
                    np.deg2rad(theta_arr),
                    _BROADCAST_THETA_ATTRS,
                ),
                "broadcast_phi": (
                    ["epoch", "sid"],
                    np.deg2rad(phi_arr),
                    _BROADCAST_PHI_ATTRS,
                ),
                "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
                "mp_correction_m": (
                    ["epoch", "sid"],
                    mp_corr_arr,
                    _MP_CORRECTION_ATTRS,
                ),
                "smoothing_corr_m": (
                    ["epoch", "sid"],
                    smoothing_corr_arr,
                    _SMOOTHING_CORR_ATTRS,
                ),
                "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
                "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
                "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
                "cum_loss_cont": (
                    ["epoch", "sid"],
                    cum_loss_cont_arr,
                    _CUM_LOSS_CONT_ATTRS,
                ),
                "car_mp_corr_cycles": (
                    ["epoch", "sid"],
                    car_mp_corr_arr,
                    _CAR_MP_CORR_ATTRS,
                ),
                "cn0_highres_correction": (
                    ["epoch", "sid"],
                    cn0_highres_arr,
                    _CN0_HIGHRES_CORRECTION_ATTRS,
                ),
            },
            coords=coords,
            attrs=attrs,
        )

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            ds = pad_to_global_sid(
                ds,
                keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
            )

        return ds

    # ------------------------------------------------------------------
    # Combined single-pass: observations + auxiliary metadata
    # ------------------------------------------------------------------

    def to_ds_and_auxiliary(
        self,
        keep_data_vars: list[str] | None = None,
        pad_global_sid: bool = True,
        strip_fillval: bool = True,
        store_raw_observables: bool = True,
        **kwargs: object,
    ) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
        """Single file scan producing both the obs dataset and the SBF metadata dataset.

        Performs ONE ``parser.read()`` pass, collecting MeasEpoch observations
        and PVTGeodetic/DOP/SatVisibility/MeasExtra metadata blocks simultaneously.
        ``to_ds()`` and ``to_metadata_ds()`` remain unchanged for standalone use.

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to retain in the obs dataset.
        pad_global_sid : bool, default True
            Pad obs dataset to the global SID space.
        strip_fillval : bool, default True
            Strip fill values from the obs dataset.
        store_raw_observables : bool, default True
            Add pre-correction "raw" observable variables to the obs dataset:
            ``SNR_raw``, ``Pseudorange_unsmoothed``, ``Pseudorange_raw``,
            ``Phase_raw``.  Set to ``False`` to reduce dataset size when these
            are not needed.
        **kwargs
            Forwarded to ``pad_to_global_sid`` (e.g. ``keep_sids``).

        Returns
        -------
        tuple[xr.Dataset, dict[str, xr.Dataset]]
            ``(obs_ds, {"sbf_obs": meta_ds})``.
        """
        import math

        parser = sbf_parser.SbfParser()
        freq_nr_cache = self._freq_nr_cache.copy()
        delta_ls: int = _DEFAULT_DELTA_LS

        # Separate sid discovery for obs (matches to_ds) and metadata (matches to_metadata_ds)
        sid_props_obs: dict[str, dict[str, Any]] = {}
        sid_props_meta: dict[str, dict[str, Any]] = {}

        # Obs-side accumulators (same as to_ds)
        timestamps_obs: list[np.datetime64] = []
        epoch_rows: list[
            tuple[
                dict[str, float], dict[str, float], dict[str, float], dict[str, float]
            ]
        ] = []

        # Metadata-side accumulators (same as to_metadata_ds)
        pending: dict[str, Any] = {
            "pvt": None,
            "dop": None,
            "status": None,
            "satvis": [],
            "extra": [],
        }
        records: list[tuple[Any, ...]] = []

        for name, data in parser.read(str(self.fpath)):
            match name:
                case "ReceiverTime":
                    delta_ls = int(data["DeltaLS"])

                case "ChannelStatus":
                    for sat in data.get("ChannelSatInfo", []):
                        svid = int(sat["SVID"])
                        if svid != 0:
                            freq_nr_cache[svid] = int(sat["FreqNr"])

                case "PVTGeodetic":
                    pending["pvt"] = data

                case "DOP":
                    pending["dop"] = data

                case "ReceiverStatus":
                    pending["status"] = data

                case "SatVisibility":
                    pending["satvis"] = list(data.get("SatInfo", []))

                case "MeasExtra":
                    pending["extra"] = list(data.get("MeasExtraChannel", []))

                case "MeasEpoch":
                    # --- Obs side ---
                    epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                    if epoch is not None:
                        ts_np = np.datetime64(
                            epoch.timestamp.replace(tzinfo=None), "ns"
                        )
                        timestamps_obs.append(ts_np)
                        e_snr: dict[str, float] = {}
                        e_pr: dict[str, float] = {}
                        e_ph: dict[str, float] = {}
                        e_dop: dict[str, float] = {}
                        for obs in epoch.observations:
                            props = _sid_props_from_obs(
                                obs.svid, obs.signal_num, freq_nr_cache
                            )
                            if props is None:
                                continue
                            sid = props["sid"]
                            if sid not in sid_props_obs:
                                sid_props_obs[sid] = props
                            if obs.cn0 is not None:
                                e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
                            if obs.pseudorange is not None:
                                e_pr[sid] = float(
                                    obs.pseudorange.to(UREG.meter).magnitude
                                )
                            if obs.phase_cycles is not None:
                                e_ph[sid] = obs.phase_cycles
                            if obs.doppler is not None:
                                e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)
                        epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

                    # --- Metadata side (always, even if epoch decoded as None) ---
                    tow_ms = int(data["TOW"])
                    wn = int(data["WNc"])
                    ts_meta = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                    obs_map = _build_obs_map(data)

                    # Discover sids from Type1/Type2 sub-blocks (same as to_metadata_ds)
                    for t1 in data.get("Type_1", []):
                        svid1 = int(t1["SVID"])
                        props1 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if props1 is not None and props1["sid"] not in sid_props_meta:
                            sid_props_meta[props1["sid"]] = props1
                        for t2 in t1.get("Type_2", []):
                            props2 = _sid_props_from_obs(
                                svid1,
                                decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                                freq_nr_cache,
                            )
                            if (
                                props2 is not None
                                and props2["sid"] not in sid_props_meta
                            ):
                                sid_props_meta[props2["sid"]] = props2

                    records.append(
                        (
                            ts_meta,
                            pending["pvt"],
                            pending["dop"],
                            pending["status"],
                            list(pending["satvis"]),
                            list(pending["extra"]),
                            obs_map,
                        )
                    )
                    pending = {
                        "pvt": None,
                        "dop": None,
                        "status": None,
                        "satvis": [],
                        "extra": [],
                    }

        # ----------------------------------------------------------------
        # Build obs dataset (verbatim from to_ds())
        # ----------------------------------------------------------------
        sorted_sids = sorted(sid_props_obs)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(timestamps_obs)
        n_sids = len(sorted_sids)

        snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
            for sid, val in e_snr.items():
                snr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_pr.items():
                pr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_ph.items():
                ph_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_dop.items():
                dop_arr[t_idx, sid_to_idx[sid]] = val

        freq_center = np.asarray(
            [sid_props_obs[s]["freq_center"] for s in sorted_sids],
            dtype=DTYPES["freq_center"],
        )
        freq_min = np.asarray(
            [sid_props_obs[s]["freq_min"] for s in sorted_sids],
            dtype=DTYPES["freq_min"],
        )
        freq_max = np.asarray(
            [sid_props_obs[s]["freq_max"] for s in sorted_sids],
            dtype=DTYPES["freq_max"],
        )

        coords_obs: dict[str, Any] = {
            "epoch": ("epoch", timestamps_obs, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": (
                "sid",
                np.array([sid_props_obs[s]["sv"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                np.array(
                    [sid_props_obs[s]["system"] for s in sorted_sids], dtype=object
                ),
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                np.array([sid_props_obs[s]["band"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                np.array([sid_props_obs[s]["code"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        attrs = cast(dict[str, Any], self._build_attrs())

        try:
            import pymap3d as pm

            hdr = self.header
            lat_deg = math.degrees(hdr.latitude_rad)
            lon_deg = math.degrees(hdr.longitude_rad)
            h_m = float(hdr.height_m.to(UREG.meter).magnitude)
            x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
            attrs["APPROX POSITION X"] = float(x)
            attrs["APPROX POSITION Y"] = float(y)
            attrs["APPROX POSITION Z"] = float(z)
        except LookupError, AttributeError:
            pass

        obs_ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pr_arr,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
                "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
                "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
            },
            coords=coords_obs,
            attrs=attrs,
        )

        if keep_data_vars is not None:
            for var in list(obs_ds.data_vars):
                if var not in keep_data_vars:
                    obs_ds = obs_ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            obs_ds = pad_to_global_sid(
                obs_ds,
                keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
            )

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            obs_ds = strip_fillvalue(obs_ds)

        validate_dataset(obs_ds, required_vars=keep_data_vars)

        # ----------------------------------------------------------------
        # Build metadata dataset (verbatim from to_metadata_ds())
        # ----------------------------------------------------------------
        sorted_sids_meta = sorted(sid_props_meta)
        sid_to_idx_meta = {sid: i for i, sid in enumerate(sorted_sids_meta)}
        n_epochs_meta = len(records)
        n_sids_meta = len(sorted_sids_meta)

        sids_for_sv: dict[str, list[int]] = {}
        for sid in sorted_sids_meta:
            sv = sid_props_meta[sid]["sv"]
            sids_for_sv.setdefault(sv, []).append(sid_to_idx_meta[sid])

        theta_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        phi_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        rise_set_arr = np.full((n_epochs_meta, n_sids_meta), -1, dtype=np.int8)
        mp_corr_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        smoothing_corr_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )
        code_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        carr_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        lock_time_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        cum_loss_cont_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )
        car_mp_corr_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )
        cn0_highres_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )

        pdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        hdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        vdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        n_sv_arr = np.full(n_epochs_meta, -1, dtype=np.int16)
        h_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        v_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        pvt_mode_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
        mean_corr_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        cpu_load_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
        temp_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        rx_error_arr = np.full(n_epochs_meta, 0, dtype=np.int32)

        timestamps_meta: list[np.datetime64] = []

        for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
            timestamps_meta.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

            if dop is not None:
                try:
                    pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            if pvt is not None:
                try:
                    n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                    raw_hacc = int(pvt["HAccuracy"])
                    if raw_hacc != 65535:
                        h_acc_arr[t_idx] = raw_hacc * 0.01
                    raw_vacc = int(pvt["VAccuracy"])
                    if raw_vacc != 65535:
                        v_acc_arr[t_idx] = raw_vacc * 0.01
                    pvt_mode_arr[t_idx] = int(pvt["Mode"])
                    mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                    if np.isnan(pdop_arr[t_idx]):
                        pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                        hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                        vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            if status is not None:
                try:
                    cpu_load_arr[t_idx] = int(status["CPULoad"])
                    raw_temp = int(status["Temperature"])
                    if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                        temp_arr[t_idx] = float(raw_temp - 100)
                    rx_error_arr[t_idx] = int(status["RxError"])
                except KeyError, TypeError, ValueError:
                    pass

            for sat_info in satvis:
                try:
                    svid_raw = int(sat_info["SVID"])
                    sys_code, prn = decode_svid(svid_raw)
                    sv = f"{sys_code}{prn:02d}"
                    theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                    phi_deg = int(sat_info["Azimuth"]) * 0.01
                    rs = int(sat_info["RiseSet"])
                    for s_idx in sids_for_sv.get(sv, []):
                        theta_arr[t_idx, s_idx] = theta_deg
                        phi_arr[t_idx, s_idx] = phi_deg
                        rise_set_arr[t_idx, s_idx] = rs
                except KeyError, TypeError, ValueError:
                    pass

            for ch in extra:
                try:
                    type_byte = int(ch["Type"])
                    info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                    sig_num = decode_signal_num(type_byte, info_byte)
                    rx_ch = int(ch["RxChannel"])
                    svid = obs_map.get((rx_ch, sig_num))
                    if svid is None:
                        continue
                    sig_def = SIGNAL_TABLE.get(sig_num)
                    if sig_def is None:
                        continue
                    sys_code2, prn2 = decode_svid(svid)
                    sv2 = f"{sys_code2}{prn2:02d}"
                    sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                    s_idx = sid_to_idx_meta.get(sid)
                    if s_idx is None:
                        continue
                    mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                    mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                    # SmoothingCorr: i2, scale 0.001 m/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    raw_sc = ch.get("SmoothingCorr")
                    if raw_sc is not None:
                        smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                    raw_cv = ch.get("CodeVar")
                    # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cv is not None and int(raw_cv) != 65535:
                        code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                    raw_rv = ch.get("CarrierVar")
                    # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_rv is not None and int(raw_rv) != 65535:
                        carr_var_arr[t_idx, s_idx] = float(raw_rv)
                    raw_lt = ch.get("LockTime")
                    # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_lt is not None and int(raw_lt) != 65535:
                        lock_time_arr[t_idx, s_idx] = float(raw_lt)
                    raw_clc = ch.get("CumLossCont")
                    # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_clc is not None:
                        cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                    raw_cmc = ch.get("CarMPCorr")
                    # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cmc is not None:
                        car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                    raw_misc = ch.get("Misc")
                    # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_misc is not None:
                        cn0_hr = int(raw_misc) & 0x07
                        cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
                except KeyError, TypeError, ValueError:
                    pass

        freq_center_meta = np.asarray(
            [sid_props_meta[s]["freq_center"] for s in sorted_sids_meta],
            dtype=np.float32,
        )
        freq_min_meta = np.asarray(
            [sid_props_meta[s]["freq_min"] for s in sorted_sids_meta], dtype=np.float32
        )
        freq_max_meta = np.asarray(
            [sid_props_meta[s]["freq_max"] for s in sorted_sids_meta], dtype=np.float32
        )

        coords_meta: dict[str, Any] = {
            "epoch": ("epoch", timestamps_meta, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                sorted_sids_meta, dims=["sid"], attrs=COORDS_METADATA["sid"]
            ),
            "sv": (
                "sid",
                [sid_props_meta[s]["sv"] for s in sorted_sids_meta],
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                [sid_props_meta[s]["system"] for s in sorted_sids_meta],
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                [sid_props_meta[s]["band"] for s in sorted_sids_meta],
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                [sid_props_meta[s]["code"] for s in sorted_sids_meta],
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center_meta, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min_meta, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max_meta, COORDS_METADATA["freq_max"]),
            "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
            "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
            "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
            "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
            "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
            "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
            "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
            "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
            "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
            "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
            "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
        }

        attrs_meta = self._build_attrs()

        meta_ds = xr.Dataset(
            data_vars={
                "broadcast_theta": (
                    ["epoch", "sid"],
                    np.deg2rad(theta_arr),
                    _BROADCAST_THETA_ATTRS,
                ),
                "broadcast_phi": (
                    ["epoch", "sid"],
                    np.deg2rad(phi_arr),
                    _BROADCAST_PHI_ATTRS,
                ),
                "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
                "mp_correction_m": (
                    ["epoch", "sid"],
                    mp_corr_arr,
                    _MP_CORRECTION_ATTRS,
                ),
                "smoothing_corr_m": (
                    ["epoch", "sid"],
                    smoothing_corr_arr,
                    _SMOOTHING_CORR_ATTRS,
                ),
                "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
                "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
                "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
                "cum_loss_cont": (
                    ["epoch", "sid"],
                    cum_loss_cont_arr,
                    _CUM_LOSS_CONT_ATTRS,
                ),
                "car_mp_corr_cycles": (
                    ["epoch", "sid"],
                    car_mp_corr_arr,
                    _CAR_MP_CORR_ATTRS,
                ),
                "cn0_highres_correction": (
                    ["epoch", "sid"],
                    cn0_highres_arr,
                    _CN0_HIGHRES_CORRECTION_ATTRS,
                ),
            },
            coords=coords_meta,
            attrs=attrs_meta,
        )

        # Align meta_ds SID to obs_ds SID.
        # obs uses sid_props_obs (MeasEpoch); meta uses sid_props_meta (Type1/Type2)
        # — they can diverge.  Reindex fills missing SIDs with NaN.
        meta_ds = meta_ds.reindex(sid=obs_ds.sid, fill_value=np.nan)
        # rise_set is int8 with sentinel -1; NaN fill promotes to float — cast back.
        if meta_ds["rise_set"].dtype != np.int8:
            meta_ds["rise_set"] = meta_ds["rise_set"].fillna(-1).astype(np.int8)

        # Apply CN0HighRes correction from MeasExtra (Block 4000) to SNR.
        # CN0HighRes extends resolution from 0.25 to 0.03125 dB-Hz.
        # RefGuide-4.14.0, MeasExtra MeasExtraChannelSub.Misc bits 0-2, p.265.
        # Where MeasExtra was not logged the correction array is NaN → no-op.
        corr = meta_ds["cn0_highres_correction"].values  # (epoch, sid), NaN if absent
        snr_raw_values = obs_ds["SNR"].values.copy()  # preserve 0.25 dB-Hz original
        snr_corrected = snr_raw_values.copy()
        valid = ~np.isnan(snr_corrected) & ~np.isnan(corr)
        snr_corrected[valid] += corr[valid]
        snr_attrs = dict(obs_ds["SNR"].attrs)
        snr_attrs["comment"] = (
            snr_attrs.get("comment", "")
            + " CN0HighRes correction from MeasExtra (Block 4000, p.265) applied where"
            " available, improving resolution from 0.25 to 0.03125 dB-Hz."
        ).lstrip()
        obs_ds["SNR"] = xr.DataArray(
            snr_corrected,
            dims=["epoch", "sid"],
            coords=obs_ds["SNR"].coords,
            attrs=snr_attrs,
        )

        if store_raw_observables:
            # ------------------------------------------------------------------
            # Add "physically raw" observables: pre-correction versions of SNR,
            # pseudorange, and carrier phase.  NaN where MeasExtra was absent.
            # Gated by store_raw_observables (config: store_sbf_raw_observables).
            # ------------------------------------------------------------------

            # SNR_raw: 0.25 dB-Hz resolution, before CN0HighRes extension.
            obs_ds["SNR_raw"] = xr.DataArray(
                snr_raw_values,
                dims=["epoch", "sid"],
                coords=obs_ds["SNR"].coords,
                attrs=_SNR_RAW_ATTRS,
            )

            # Pseudorange_unsmoothed: Hatch-filter correction removed.
            smooth = meta_ds["smoothing_corr_m"].values
            pr_vals = obs_ds["Pseudorange"].values
            pr_unsmoothed = np.where(
                ~np.isnan(smooth), pr_vals + smooth, np.nan
            ).astype(np.float64)
            obs_ds["Pseudorange_unsmoothed"] = xr.DataArray(
                pr_unsmoothed,
                dims=["epoch", "sid"],
                coords=obs_ds["Pseudorange"].coords,
                attrs=_PSEUDORANGE_UNSMOOTHED_ATTRS,
            )

            # Pseudorange_raw: both Hatch-filter and multipath corrections removed.
            mp = meta_ds["mp_correction_m"].values
            available = ~np.isnan(smooth) & ~np.isnan(mp)
            pr_raw = np.where(available, pr_vals + smooth + mp, np.nan).astype(
                np.float64
            )
            obs_ds["Pseudorange_raw"] = xr.DataArray(
                pr_raw,
                dims=["epoch", "sid"],
                coords=obs_ds["Pseudorange"].coords,
                attrs=_PSEUDORANGE_RAW_ATTRS,
            )

            # Phase_raw: carrier multipath correction removed.
            car_mp = meta_ds["car_mp_corr_cycles"].values
            ph_vals = obs_ds["Phase"].values
            ph_raw = np.where(~np.isnan(car_mp), ph_vals + car_mp, np.nan).astype(
                np.float64
            )
            obs_ds["Phase_raw"] = xr.DataArray(
                ph_raw,
                dims=["epoch", "sid"],
                coords=obs_ds["Phase"].coords,
                attrs=_PHASE_RAW_ATTRS,
            )

        return obs_ds, {"sbf_obs": meta_ds}

    # ------------------------------------------------------------------
    # Private decoding helpers
    # ------------------------------------------------------------------

    def _decode_epoch(  # pylint: disable=too-many-locals
        self,
        data: dict[str, Any],
        freq_nr_cache: dict[int, int],
        delta_ls: int,
    ) -> SbfEpoch | None:
        """Decode one raw MeasEpoch dict into an :class:`SbfEpoch`.

        Parameters
        ----------
        data : dict
            Raw block dict from ``sbf_parser``.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr mapping for GLONASS FDMA frequency lookup.
        delta_ls : int
            GPS - UTC leap second offset.

        Returns
        -------
        SbfEpoch or None
            Decoded epoch, or ``None`` if decoding fails (logged as warning).
        """
        tow_ms = int(data["TOW"])
        wn = int(data["WNc"])
        timestamp = _tow_wn_to_utc(tow_ms, wn, delta_ls)
        common_flags = int(data["CommonFlags"])
        cum_clk_jumps = int(data["CumClkJumps"])

        observations: list[SbfSignalObs] = []

        for t1 in data.get("Type_1", []):
            t1_obs, t1_freq = self._decode_type1(t1, freq_nr_cache)
            if t1_obs is not None:
                observations.append(t1_obs)
                # Decode linked Type2 slave observations
                pr1 = t1_obs.pseudorange
                d1 = t1_obs.doppler
                if pr1 is not None and d1 is not None and t1_freq is not None:
                    for t2 in t1.get("Type_2", []):
                        t2_obs = self._decode_type2(
                            t2, int(t1["SVID"]), pr1, d1, t1_freq, freq_nr_cache
                        )
                        if t2_obs is not None:
                            observations.append(t2_obs)

        return SbfEpoch(
            tow_ms=tow_ms,
            wn=wn,
            timestamp=timestamp,
            common_flags=common_flags,
            cum_clk_jumps=cum_clk_jumps,
            observations=tuple(observations),
        )

    def _resolve_freq(
        self,
        sig_num: int,
        svid: int,
        freq_nr_cache: dict[int, int],
    ) -> pint.Quantity | None:
        """Return carrier frequency as a pint Quantity, or None if unavailable.

        Parameters
        ----------
        sig_num : int
            Signal type number (0-39).
        svid : int
            Septentrio internal SVID.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr map.

        Returns
        -------
        pint.Quantity or None
            Carrier frequency (in MHz), or ``None`` if GLONASS and FreqNr
            not yet known, or signal not in table (e.g. L-Band MSS).
        """
        if sig_num in FDMA_SIGNAL_NUMS:
            freq_nr = freq_nr_cache.get(svid)
            if freq_nr is None:
                return None
            return glonass_freq_hz(sig_num, freq_nr)

        sig_def = SIGNAL_TABLE.get(sig_num)
        if sig_def is None:
            return None
        return sig_def.freq  # None for L-Band MSS (sig 23)

    def _decode_type1(  # pylint: disable=too-many-locals
        self,
        t1: dict[str, Any],
        freq_nr_cache: dict[int, int],
    ) -> tuple[SbfSignalObs | None, pint.Quantity | None]:
        """Decode a Type1 sub-block dict to an SbfSignalObs.

        Parameters
        ----------
        t1 : dict
            Raw Type1 sub-block dict.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr map.

        Returns
        -------
        obs : SbfSignalObs or None
            Decoded observation, or ``None`` for unknown signals.
        freq : pint.Quantity or None
            Carrier frequency used (needed for Type2 Doppler scaling).
        """
        svid = int(t1["SVID"])
        type_byte = int(t1["Type"])
        obs_info = int(t1["ObsInfo"])
        sig_num = decode_signal_num(type_byte, obs_info)

        sig_def = SIGNAL_TABLE.get(sig_num)
        if sig_def is None:
            log.debug("sbf_unknown_signal", svid=svid, sig_num=sig_num)
            return None, None

        system, prn = decode_svid(svid)
        freq = self._resolve_freq(sig_num, svid, freq_nr_cache)

        misc = int(t1["Misc"])
        code_lsb = int(t1["CodeLSB"])
        pr = pseudorange_m(misc, code_lsb)
        dop = doppler_hz(int(t1["Doppler"]))
        carrier_msb = int(t1["CarrierMSB"])
        carrier_lsb = int(t1["CarrierLSB"])

        ph: float | None = None
        if pr is not None and freq is not None:
            ph = phase_cycles(pr, carrier_msb, carrier_lsb, freq)

        obs = SbfSignalObs(
            svid=svid,
            system=system,
            prn=prn,
            signal_num=sig_num,
            signal_type=sig_def.signal_type,
            rx_channel=int(t1["RxChannel"]),
            lock_time_ms=int(t1["LockTime"]),
            cn0=cn0_dbhz(int(t1["CN0"]), sig_num),
            pseudorange=pr,
            doppler=dop,
            phase_cycles=ph,
            obs_info=obs_info,
            is_type2=False,
        )
        return obs, freq

    def _decode_type2(  # pylint: disable=too-many-arguments,too-many-locals,too-many-positional-arguments
        self,
        t2: dict[str, Any],
        svid: int,
        pr1: pint.Quantity,
        d1: pint.Quantity,
        freq1: pint.Quantity,
        freq_nr_cache: dict[int, int],
    ) -> SbfSignalObs | None:
        """Decode a Type2 sub-block dict to an SbfSignalObs.

        Parameters
        ----------
        t2 : dict
            Raw Type2 sub-block dict.
        svid : int
            SVID of the parent Type1 sub-block.
        pr1 : pint.Quantity
            Type1 pseudorange in metres.
        d1 : pint.Quantity
            Type1 Doppler in Hz.
        freq1 : pint.Quantity
            Type1 carrier frequency.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr map.

        Returns
        -------
        SbfSignalObs or None
            Decoded observation, or ``None`` for unknown signals.
        """
        type_byte = int(t2["Type"])
        obs_info = int(t2["ObsInfo"])
        sig_num = decode_signal_num(type_byte, obs_info)

        sig_def = SIGNAL_TABLE.get(sig_num)
        if sig_def is None:
            log.debug("sbf_unknown_type2_signal", svid=svid, sig_num=sig_num)
            return None

        system, prn = decode_svid(svid)
        freq2 = self._resolve_freq(sig_num, svid, freq_nr_cache)

        code_msb_signed, doppler_msb_signed = decode_offsets_msb(int(t2["OffsetMSB"]))
        code_offset_lsb = int(t2["CodeOffsetLSB"])
        doppler_offset_lsb = int(t2["DopplerOffsetLSB"])
        carrier_msb = int(t2["CarrierMSB"])
        carrier_lsb = int(t2["CarrierLSB"])

        pr2 = pr2_m(pr1, code_msb_signed, code_offset_lsb)

        d2: pint.Quantity | None = None
        if freq2 is not None:
            d2 = doppler2_hz(d1, doppler_msb_signed, doppler_offset_lsb, freq2, freq1)

        ph: float | None = None
        if pr2 is not None and freq2 is not None:
            ph = phase_cycles(pr2, carrier_msb, carrier_lsb, freq2)

        return SbfSignalObs(
            svid=svid,
            system=system,
            prn=prn,
            signal_num=sig_num,
            signal_type=sig_def.signal_type,
            rx_channel=int(t2.get("RxChannel", 0)),
            lock_time_ms=int(t2["LockTime"]),
            cn0=cn0_dbhz(int(t2["CN0"]), sig_num),
            pseudorange=pr2,
            doppler=d2,
            phase_cycles=ph,
            obs_info=obs_info,
            is_type2=True,
        )

    def __repr__(self) -> str:
        """Return a short string representation."""
        return f"SbfReader(file='{self.fpath.name}', epochs={self.num_epochs})"

`file_hash` `cached` `property` ¶

SHA-256 hex digest of the file (first 16 characters).

Returns¶

str 16-character hexadecimal prefix of the SHA-256 hash.

`start_time` `cached` `property` ¶

Return the timestamp of the first decoded epoch.

Returns¶

datetime Timezone-aware UTC datetime of the first observation epoch.

Raises¶

LookupError If the file contains no decodable epochs.

`end_time` `cached` `property` ¶

Return the timestamp of the last decoded epoch.

Returns¶

datetime Timezone-aware UTC datetime of the last observation epoch.

Raises¶

LookupError If the file contains no decodable epochs.

`systems` `cached` `property` ¶

Return sorted list of GNSS system codes present in the file.

Returns¶

list of str Sorted list of RINEX system letters (e.g. ["E", "G", "R"]).

`num_satellites` `cached` `property` ¶

Return the number of unique satellites observed in the file.

Returns¶

int Count of unique system + PRN pairs across all epochs.

`num_epochs` `cached` `property` ¶

Count the number of MeasEpoch blocks in the file.

Returns¶

int Total MeasEpoch block count (one per observation epoch).

Notes¶

Scans the entire file once; result is cached.

`header` `cached` `property` ¶

Parse the first ReceiverSetup block in the file.

SbfHeader Receiver metadata.

LookupError If no ReceiverSetup block is found.

`iter_epochs()` ¶

Iterate over decoded MeasEpoch blocks.

Yields decoded :class:SbfEpoch objects with all signal observations converted to physical units as :class:pint.Quantity.

Yields¶

SbfEpoch One decoded observation epoch.

Notes¶

The file is scanned from start to finish on each call.
The :attr:_freq_nr_cache is pre-populated from ALL ChannelStatus blocks before the first call, so all GLONASS FDMA epochs have accurate carrier frequencies.
delta_ls (leap seconds) is taken from the most recent ReceiverTime block; defaults to 18 if none has been seen yet.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py

def iter_epochs(self) -> Iterator[SbfEpoch]:
    """Iterate over decoded MeasEpoch blocks.

    Yields decoded :class:`SbfEpoch` objects with all signal observations
    converted to physical units as :class:`pint.Quantity`.

    Yields
    ------
    SbfEpoch
        One decoded observation epoch.

    Notes
    -----
    - The file is scanned from start to finish on each call.
    - The :attr:`_freq_nr_cache` is pre-populated from ALL ChannelStatus
      blocks before the first call, so all GLONASS FDMA epochs have
      accurate carrier frequencies.
    - ``delta_ls`` (leap seconds) is taken from the most recent
      ReceiverTime block; defaults to 18 if none has been seen yet.
    """
    parser = sbf_parser.SbfParser()
    freq_nr_cache: dict[int, int] = self._freq_nr_cache.copy()
    delta_ls: int = _DEFAULT_DELTA_LS

    for name, data in parser.read(str(self.fpath)):
        match name:
            case "ReceiverTime":
                delta_ls = int(data["DeltaLS"])

            case "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid = int(sat["SVID"])
                    if svid != 0:
                        freq_nr_cache[svid] = int(sat["FreqNr"])

            case "MeasEpoch":
                epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                if epoch is not None:
                    yield epoch

`to_ds(keep_data_vars=None, pad_global_sid=True, strip_fillval=True, **kwargs)` ¶

Convert SBF observations to an (epoch, sid) xarray Dataset.

Produces the same structure as :class:~canvod.readers.rinex.v3_04.Rnxv3Obs and passes :func:~canvod.readers.base.validate_dataset.

Parameters¶

keep_data_vars : list of str, optional Data variables to retain. If None, all five variables are kept: SNR, Pseudorange, Phase, Doppler, SSI. Note: LLI is not produced — SBF has no loss-of-lock indicator. pad_global_sid : bool, default True If True, pads the dataset to the global SID space via :func:canvod.auxiliary.preprocessing.pad_to_global_sid. strip_fillval : bool, default True If True, removes fill values via :func:canvod.auxiliary.preprocessing.strip_fillvalue. **kwargs Ignored (for ABC compatibility).

Returns¶

xr.Dataset Dataset with dimensions (epoch, sid) that passes :func:~canvod.readers.base.validate_dataset.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py

def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    pad_global_sid: bool = True,
    strip_fillval: bool = True,
    **kwargs: object,
) -> xr.Dataset:
    """Convert SBF observations to an ``(epoch, sid)`` xarray Dataset.

    Produces the same structure as :class:`~canvod.readers.rinex.v3_04.Rnxv3Obs`
    and passes :func:`~canvod.readers.base.validate_dataset`.

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to retain.  If ``None``, all five variables are
        kept: ``SNR``, ``Pseudorange``, ``Phase``, ``Doppler``, ``SSI``.
        Note: ``LLI`` is not produced — SBF has no loss-of-lock indicator.
    pad_global_sid : bool, default True
        If ``True``, pads the dataset to the global SID space via
        :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.
    strip_fillval : bool, default True
        If ``True``, removes fill values via
        :func:`canvod.auxiliary.preprocessing.strip_fillvalue`.
    **kwargs
        Ignored (for ABC compatibility).

    Returns
    -------
    xr.Dataset
        Dataset with dimensions ``(epoch, sid)`` that passes
        :func:`~canvod.readers.base.validate_dataset`.
    """
    import math

    freq_nr_cache = self._freq_nr_cache.copy()

    # --- Single pass: collect timestamps, SID properties, and per-epoch obs ---
    # Stores per-epoch obs as dicts (SID → value) so we only scan the file once.
    # Array construction happens afterwards in fast in-memory loops.
    sid_props: dict[str, dict[str, Any]] = {}
    timestamps: list[np.datetime64] = []
    # Per-epoch accumulator: list of (snr_dict, pr_dict, ph_dict, dop_dict)
    epoch_rows: list[
        tuple[
            dict[str, float], dict[str, float], dict[str, float], dict[str, float]
        ]
    ] = []

    for epoch in self.iter_epochs():
        ts_np = np.datetime64(epoch.timestamp.replace(tzinfo=None), "ns")
        timestamps.append(ts_np)

        e_snr: dict[str, float] = {}
        e_pr: dict[str, float] = {}
        e_ph: dict[str, float] = {}
        e_dop: dict[str, float] = {}

        for obs in epoch.observations:
            props = _sid_props_from_obs(obs.svid, obs.signal_num, freq_nr_cache)
            if props is None:
                continue
            sid = props["sid"]
            if sid not in sid_props:
                sid_props[sid] = props
            if obs.cn0 is not None:
                e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
            if obs.pseudorange is not None:
                e_pr[sid] = float(obs.pseudorange.to(UREG.meter).magnitude)
            if obs.phase_cycles is not None:
                e_ph[sid] = obs.phase_cycles
            if obs.doppler is not None:
                e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)

        epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

    sorted_sids = sorted(sid_props)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(timestamps)
    n_sids = len(sorted_sids)

    # Allocate arrays (LLI is dropped — SBF has no loss-of-lock indicator)
    snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
    pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
    ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
    dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
    ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

    for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
        for sid, val in e_snr.items():
            snr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_pr.items():
            pr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_ph.items():
            ph_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_dop.items():
            dop_arr[t_idx, sid_to_idx[sid]] = val

    # Build coordinate arrays
    freq_center = np.asarray(
        [sid_props[s]["freq_center"] for s in sorted_sids],
        dtype=DTYPES["freq_center"],
    )
    freq_min = np.asarray(
        [sid_props[s]["freq_min"] for s in sorted_sids], dtype=DTYPES["freq_min"]
    )
    freq_max = np.asarray(
        [sid_props[s]["freq_max"] for s in sorted_sids], dtype=DTYPES["freq_max"]
    )

    coords: dict[str, Any] = {
        "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": (
            "sid",
            np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    attrs = cast(dict[str, Any], self._build_attrs())

    # Add ECEF position from ReceiverSetup header for pipeline compatibility.
    # ECEFPosition.from_ds_metadata() reads "APPROX POSITION X/Y/Z".
    try:
        import pymap3d as pm

        hdr = self.header
        lat_deg = math.degrees(hdr.latitude_rad)
        lon_deg = math.degrees(hdr.longitude_rad)
        h_m = float(hdr.height_m.to(UREG.meter).magnitude)
        x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
        attrs["APPROX POSITION X"] = float(x)
        attrs["APPROX POSITION Y"] = float(y)
        attrs["APPROX POSITION Z"] = float(z)
    except LookupError, AttributeError:
        pass  # SBF file without a ReceiverSetup block

    ds = xr.Dataset(
        data_vars={
            "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
            "Pseudorange": (
                ["epoch", "sid"],
                pr_arr,
                OBSERVABLES_METADATA["Pseudorange"],
            ),
            "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
            "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
            "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
        },
        coords=coords,
        attrs=attrs,
    )

    # Post-process
    if keep_data_vars is not None:
        for var in list(ds.data_vars):
            if var not in keep_data_vars:
                ds = ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        ds = pad_to_global_sid(
            ds,
            keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
        )

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        ds = strip_fillvalue(ds)

    validate_dataset(ds, required_vars=keep_data_vars)
    return ds

`to_metadata_ds(pad_global_sid=True, **kwargs)` ¶

Decode SBF metadata blocks to an (epoch, sid) xarray Dataset.

Decodes PVTGeodetic, DOP, ReceiverStatus, SatVisibility, and MeasExtra blocks in a single file scan.

Parameters¶

pad_global_sid : bool, default True If True, pads to the global SID space via :func:canvod.auxiliary.preprocessing.pad_to_global_sid.

Returns¶

xr.Dataset Dataset with dimensions (epoch, sid). Epoch-level scalars (PDOP, NrSV, …) are 1-D (epoch,) coordinates. Satellite geometry (theta, phi) and signal quality (MPCorrection, …) are (epoch, sid) data variables.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py

def to_metadata_ds(
    self, pad_global_sid: bool = True, **kwargs: object
) -> xr.Dataset:
    """Decode SBF metadata blocks to an ``(epoch, sid)`` xarray Dataset.

    Decodes PVTGeodetic, DOP, ReceiverStatus, SatVisibility, and
    MeasExtra blocks in a single file scan.

    Parameters
    ----------
    pad_global_sid : bool, default True
        If ``True``, pads to the global SID space via
        :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.

    Returns
    -------
    xr.Dataset
        Dataset with dimensions ``(epoch, sid)``.  Epoch-level scalars
        (PDOP, NrSV, …) are 1-D ``(epoch,)`` coordinates.  Satellite
        geometry (theta, phi) and signal quality (MPCorrection, …) are
        ``(epoch, sid)`` data variables.
    """
    parser = sbf_parser.SbfParser()
    freq_nr_cache = self._freq_nr_cache.copy()

    pending: dict[str, Any] = {
        "pvt": None,
        "dop": None,
        "status": None,
        "satvis": [],
        "extra": [],
    }

    # Each record: (ts, pvt, dop, status, satvis, extra, obs_map)
    records: list[tuple[Any, ...]] = []

    # sid discovery — same logic as to_ds() pass 1
    sid_props: dict[str, dict[str, Any]] = {}

    delta_ls: int = _DEFAULT_DELTA_LS

    for name, data in parser.read(str(self.fpath)):
        match name:
            case "ReceiverTime":
                delta_ls = int(data["DeltaLS"])

            case "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid_cs = int(sat["SVID"])
                    if svid_cs != 0:
                        freq_nr_cache[svid_cs] = int(sat["FreqNr"])

            case "PVTGeodetic":
                pending["pvt"] = data

            case "DOP":
                pending["dop"] = data

            case "ReceiverStatus":
                pending["status"] = data

            case "SatVisibility":
                pending["satvis"] = list(data.get("SatInfo", []))

            case "MeasExtra":
                pending["extra"] = list(data.get("MeasExtraChannel", []))

            case "MeasEpoch":
                tow_ms = int(data["TOW"])
                wn = int(data["WNc"])
                ts = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                obs_map = _build_obs_map(data)

                # Discover sids from Type1 and Type2 sub-blocks
                for t1 in data.get("Type_1", []):
                    svid1 = int(t1["SVID"])
                    props1 = _sid_props_from_obs(
                        svid1,
                        decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                        freq_nr_cache,
                    )
                    if props1 is not None and props1["sid"] not in sid_props:
                        sid_props[props1["sid"]] = props1

                    for t2 in t1.get("Type_2", []):
                        props2 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if props2 is not None and props2["sid"] not in sid_props:
                            sid_props[props2["sid"]] = props2

                records.append(
                    (
                        ts,
                        pending["pvt"],
                        pending["dop"],
                        pending["status"],
                        list(pending["satvis"]),
                        list(pending["extra"]),
                        obs_map,
                    )
                )
                pending = {
                    "pvt": None,
                    "dop": None,
                    "status": None,
                    "satvis": [],
                    "extra": [],
                }

    # Build index structures
    sorted_sids = sorted(sid_props)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(records)
    n_sids = len(sorted_sids)

    # sv → list of sid indices (for SatVisibility broadcasting)
    sids_for_sv: dict[str, list[int]] = {}
    for sid in sorted_sids:
        sv = sid_props[sid]["sv"]
        sids_for_sv.setdefault(sv, []).append(sid_to_idx[sid])

    # (epoch, sid) data variable arrays
    theta_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    phi_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    rise_set_arr = np.full((n_epochs, n_sids), -1, dtype=np.int8)
    mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    smoothing_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    code_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    carr_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    lock_time_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    cum_loss_cont_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    car_mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    cn0_highres_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)

    # (epoch,) scalar coordinate arrays
    pdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    hdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    vdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    n_sv_arr = np.full(n_epochs, -1, dtype=np.int16)
    h_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    v_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    pvt_mode_arr = np.full(n_epochs, -1, dtype=np.int8)
    mean_corr_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    cpu_load_arr = np.full(n_epochs, -1, dtype=np.int8)
    temp_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    rx_error_arr = np.full(n_epochs, 0, dtype=np.int32)

    timestamps: list[np.datetime64] = []

    # Fill arrays from records
    for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
        timestamps.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

        # DOP block → pdop, hdop, vdop
        if dop is not None:
            try:
                pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        # PVTGeodetic → n_sv, accuracy, mode, correction age
        if pvt is not None:
            try:
                n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                raw_hacc = int(pvt["HAccuracy"])
                if raw_hacc != 65535:
                    h_acc_arr[t_idx] = raw_hacc * 0.01
                raw_vacc = int(pvt["VAccuracy"])
                if raw_vacc != 65535:
                    v_acc_arr[t_idx] = raw_vacc * 0.01
                pvt_mode_arr[t_idx] = int(pvt["Mode"])
                mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                # Also pick up DOP from PVTGeodetic if DOP block absent
                if np.isnan(pdop_arr[t_idx]):
                    pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        # ReceiverStatus → cpu_load, temperature, rx_error
        if status is not None:
            try:
                cpu_load_arr[t_idx] = int(status["CPULoad"])
                raw_temp = int(status["Temperature"])
                if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                    temp_arr[t_idx] = float(raw_temp - 100)
                rx_error_arr[t_idx] = int(status["RxError"])
            except KeyError, TypeError, ValueError:
                pass

        # SatVisibility → broadcast theta/phi to all sids for that sv
        for sat_info in satvis:
            try:
                svid_raw = int(sat_info["SVID"])
                sys_code, prn = decode_svid(svid_raw)
                sv = f"{sys_code}{prn:02d}"
                theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                phi_deg = int(sat_info["Azimuth"]) * 0.01
                rs = int(sat_info["RiseSet"])
                for s_idx in sids_for_sv.get(sv, []):
                    theta_arr[t_idx, s_idx] = theta_deg
                    phi_arr[t_idx, s_idx] = phi_deg
                    rise_set_arr[t_idx, s_idx] = rs
            except KeyError, TypeError, ValueError:
                pass

        # MeasExtra → per-(epoch, sid) signal quality
        for ch in extra:
            try:
                type_byte = int(ch["Type"])
                info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                sig_num = decode_signal_num(type_byte, info_byte)
                rx_ch = int(ch["RxChannel"])
                svid = obs_map.get((rx_ch, sig_num))
                if svid is None:
                    continue
                sig_def = SIGNAL_TABLE.get(sig_num)
                if sig_def is None:
                    continue
                sys_code2, prn2 = decode_svid(svid)
                sv2 = f"{sys_code2}{prn2:02d}"
                sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                s_idx = sid_to_idx.get(sid)
                if s_idx is None:
                    continue
                mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                # SmoothingCorr: i2, scale 0.001 m/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                raw_sc = ch.get("SmoothingCorr")
                if raw_sc is not None:
                    smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                raw_cv = ch.get("CodeVar")
                # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cv is not None and int(raw_cv) != 65535:
                    code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                raw_rv = ch.get("CarrierVar")
                # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_rv is not None and int(raw_rv) != 65535:
                    carr_var_arr[t_idx, s_idx] = float(raw_rv)
                raw_lt = ch.get("LockTime")
                # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_lt is not None and int(raw_lt) != 65535:
                    lock_time_arr[t_idx, s_idx] = float(raw_lt)
                raw_clc = ch.get("CumLossCont")
                # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_clc is not None:
                    cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                raw_cmc = ch.get("CarMPCorr")
                # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cmc is not None:
                    car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                raw_misc = ch.get("Misc")
                # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_misc is not None:
                    cn0_hr = int(raw_misc) & 0x07
                    cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
            except KeyError, TypeError, ValueError:
                pass

    # Build Dataset
    freq_center = np.asarray(
        [sid_props[s]["freq_center"] for s in sorted_sids], dtype=np.float32
    )
    freq_min = np.asarray(
        [sid_props[s]["freq_min"] for s in sorted_sids], dtype=np.float32
    )
    freq_max = np.asarray(
        [sid_props[s]["freq_max"] for s in sorted_sids], dtype=np.float32
    )

    coords: dict[str, Any] = {
        "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": (
            "sid",
            np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        # Epoch-level scalars (1-D over epoch)
        "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
        "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
        "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
        "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
        "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
        "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
        "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
        "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
        "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
        "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
        "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
    }

    attrs = self._build_attrs()

    ds = xr.Dataset(
        data_vars={
            "broadcast_theta": (
                ["epoch", "sid"],
                np.deg2rad(theta_arr),
                _BROADCAST_THETA_ATTRS,
            ),
            "broadcast_phi": (
                ["epoch", "sid"],
                np.deg2rad(phi_arr),
                _BROADCAST_PHI_ATTRS,
            ),
            "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
            "mp_correction_m": (
                ["epoch", "sid"],
                mp_corr_arr,
                _MP_CORRECTION_ATTRS,
            ),
            "smoothing_corr_m": (
                ["epoch", "sid"],
                smoothing_corr_arr,
                _SMOOTHING_CORR_ATTRS,
            ),
            "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
            "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
            "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
            "cum_loss_cont": (
                ["epoch", "sid"],
                cum_loss_cont_arr,
                _CUM_LOSS_CONT_ATTRS,
            ),
            "car_mp_corr_cycles": (
                ["epoch", "sid"],
                car_mp_corr_arr,
                _CAR_MP_CORR_ATTRS,
            ),
            "cn0_highres_correction": (
                ["epoch", "sid"],
                cn0_highres_arr,
                _CN0_HIGHRES_CORRECTION_ATTRS,
            ),
        },
        coords=coords,
        attrs=attrs,
    )

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        ds = pad_to_global_sid(
            ds,
            keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
        )

    return ds

`to_ds_and_auxiliary(keep_data_vars=None, pad_global_sid=True, strip_fillval=True, store_raw_observables=True, **kwargs)` ¶

Single file scan producing both the obs dataset and the SBF metadata dataset.

Performs ONE parser.read() pass, collecting MeasEpoch observations and PVTGeodetic/DOP/SatVisibility/MeasExtra metadata blocks simultaneously. to_ds() and to_metadata_ds() remain unchanged for standalone use.

Parameters¶

keep_data_vars : list of str, optional Data variables to retain in the obs dataset. pad_global_sid : bool, default True Pad obs dataset to the global SID space. strip_fillval : bool, default True Strip fill values from the obs dataset. store_raw_observables : bool, default True Add pre-correction "raw" observable variables to the obs dataset: SNR_raw, Pseudorange_unsmoothed, Pseudorange_raw, Phase_raw. Set to False to reduce dataset size when these are not needed. **kwargs Forwarded to pad_to_global_sid (e.g. keep_sids).

Returns¶

tuple[xr.Dataset, dict[str, xr.Dataset]] (obs_ds, {"sbf_obs": meta_ds}).

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py

def to_ds_and_auxiliary(
    self,
    keep_data_vars: list[str] | None = None,
    pad_global_sid: bool = True,
    strip_fillval: bool = True,
    store_raw_observables: bool = True,
    **kwargs: object,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
    """Single file scan producing both the obs dataset and the SBF metadata dataset.

    Performs ONE ``parser.read()`` pass, collecting MeasEpoch observations
    and PVTGeodetic/DOP/SatVisibility/MeasExtra metadata blocks simultaneously.
    ``to_ds()`` and ``to_metadata_ds()`` remain unchanged for standalone use.

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to retain in the obs dataset.
    pad_global_sid : bool, default True
        Pad obs dataset to the global SID space.
    strip_fillval : bool, default True
        Strip fill values from the obs dataset.
    store_raw_observables : bool, default True
        Add pre-correction "raw" observable variables to the obs dataset:
        ``SNR_raw``, ``Pseudorange_unsmoothed``, ``Pseudorange_raw``,
        ``Phase_raw``.  Set to ``False`` to reduce dataset size when these
        are not needed.
    **kwargs
        Forwarded to ``pad_to_global_sid`` (e.g. ``keep_sids``).

    Returns
    -------
    tuple[xr.Dataset, dict[str, xr.Dataset]]
        ``(obs_ds, {"sbf_obs": meta_ds})``.
    """
    import math

    parser = sbf_parser.SbfParser()
    freq_nr_cache = self._freq_nr_cache.copy()
    delta_ls: int = _DEFAULT_DELTA_LS

    # Separate sid discovery for obs (matches to_ds) and metadata (matches to_metadata_ds)
    sid_props_obs: dict[str, dict[str, Any]] = {}
    sid_props_meta: dict[str, dict[str, Any]] = {}

    # Obs-side accumulators (same as to_ds)
    timestamps_obs: list[np.datetime64] = []
    epoch_rows: list[
        tuple[
            dict[str, float], dict[str, float], dict[str, float], dict[str, float]
        ]
    ] = []

    # Metadata-side accumulators (same as to_metadata_ds)
    pending: dict[str, Any] = {
        "pvt": None,
        "dop": None,
        "status": None,
        "satvis": [],
        "extra": [],
    }
    records: list[tuple[Any, ...]] = []

    for name, data in parser.read(str(self.fpath)):
        match name:
            case "ReceiverTime":
                delta_ls = int(data["DeltaLS"])

            case "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid = int(sat["SVID"])
                    if svid != 0:
                        freq_nr_cache[svid] = int(sat["FreqNr"])

            case "PVTGeodetic":
                pending["pvt"] = data

            case "DOP":
                pending["dop"] = data

            case "ReceiverStatus":
                pending["status"] = data

            case "SatVisibility":
                pending["satvis"] = list(data.get("SatInfo", []))

            case "MeasExtra":
                pending["extra"] = list(data.get("MeasExtraChannel", []))

            case "MeasEpoch":
                # --- Obs side ---
                epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                if epoch is not None:
                    ts_np = np.datetime64(
                        epoch.timestamp.replace(tzinfo=None), "ns"
                    )
                    timestamps_obs.append(ts_np)
                    e_snr: dict[str, float] = {}
                    e_pr: dict[str, float] = {}
                    e_ph: dict[str, float] = {}
                    e_dop: dict[str, float] = {}
                    for obs in epoch.observations:
                        props = _sid_props_from_obs(
                            obs.svid, obs.signal_num, freq_nr_cache
                        )
                        if props is None:
                            continue
                        sid = props["sid"]
                        if sid not in sid_props_obs:
                            sid_props_obs[sid] = props
                        if obs.cn0 is not None:
                            e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
                        if obs.pseudorange is not None:
                            e_pr[sid] = float(
                                obs.pseudorange.to(UREG.meter).magnitude
                            )
                        if obs.phase_cycles is not None:
                            e_ph[sid] = obs.phase_cycles
                        if obs.doppler is not None:
                            e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)
                    epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

                # --- Metadata side (always, even if epoch decoded as None) ---
                tow_ms = int(data["TOW"])
                wn = int(data["WNc"])
                ts_meta = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                obs_map = _build_obs_map(data)

                # Discover sids from Type1/Type2 sub-blocks (same as to_metadata_ds)
                for t1 in data.get("Type_1", []):
                    svid1 = int(t1["SVID"])
                    props1 = _sid_props_from_obs(
                        svid1,
                        decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                        freq_nr_cache,
                    )
                    if props1 is not None and props1["sid"] not in sid_props_meta:
                        sid_props_meta[props1["sid"]] = props1
                    for t2 in t1.get("Type_2", []):
                        props2 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if (
                            props2 is not None
                            and props2["sid"] not in sid_props_meta
                        ):
                            sid_props_meta[props2["sid"]] = props2

                records.append(
                    (
                        ts_meta,
                        pending["pvt"],
                        pending["dop"],
                        pending["status"],
                        list(pending["satvis"]),
                        list(pending["extra"]),
                        obs_map,
                    )
                )
                pending = {
                    "pvt": None,
                    "dop": None,
                    "status": None,
                    "satvis": [],
                    "extra": [],
                }

    # ----------------------------------------------------------------
    # Build obs dataset (verbatim from to_ds())
    # ----------------------------------------------------------------
    sorted_sids = sorted(sid_props_obs)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(timestamps_obs)
    n_sids = len(sorted_sids)

    snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
    pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
    ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
    dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
    ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

    for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
        for sid, val in e_snr.items():
            snr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_pr.items():
            pr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_ph.items():
            ph_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_dop.items():
            dop_arr[t_idx, sid_to_idx[sid]] = val

    freq_center = np.asarray(
        [sid_props_obs[s]["freq_center"] for s in sorted_sids],
        dtype=DTYPES["freq_center"],
    )
    freq_min = np.asarray(
        [sid_props_obs[s]["freq_min"] for s in sorted_sids],
        dtype=DTYPES["freq_min"],
    )
    freq_max = np.asarray(
        [sid_props_obs[s]["freq_max"] for s in sorted_sids],
        dtype=DTYPES["freq_max"],
    )

    coords_obs: dict[str, Any] = {
        "epoch": ("epoch", timestamps_obs, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": (
            "sid",
            np.array([sid_props_obs[s]["sv"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            np.array(
                [sid_props_obs[s]["system"] for s in sorted_sids], dtype=object
            ),
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            np.array([sid_props_obs[s]["band"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            np.array([sid_props_obs[s]["code"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    attrs = cast(dict[str, Any], self._build_attrs())

    try:
        import pymap3d as pm

        hdr = self.header
        lat_deg = math.degrees(hdr.latitude_rad)
        lon_deg = math.degrees(hdr.longitude_rad)
        h_m = float(hdr.height_m.to(UREG.meter).magnitude)
        x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
        attrs["APPROX POSITION X"] = float(x)
        attrs["APPROX POSITION Y"] = float(y)
        attrs["APPROX POSITION Z"] = float(z)
    except LookupError, AttributeError:
        pass

    obs_ds = xr.Dataset(
        data_vars={
            "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
            "Pseudorange": (
                ["epoch", "sid"],
                pr_arr,
                OBSERVABLES_METADATA["Pseudorange"],
            ),
            "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
            "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
            "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
        },
        coords=coords_obs,
        attrs=attrs,
    )

    if keep_data_vars is not None:
        for var in list(obs_ds.data_vars):
            if var not in keep_data_vars:
                obs_ds = obs_ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        obs_ds = pad_to_global_sid(
            obs_ds,
            keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
        )

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        obs_ds = strip_fillvalue(obs_ds)

    validate_dataset(obs_ds, required_vars=keep_data_vars)

    # ----------------------------------------------------------------
    # Build metadata dataset (verbatim from to_metadata_ds())
    # ----------------------------------------------------------------
    sorted_sids_meta = sorted(sid_props_meta)
    sid_to_idx_meta = {sid: i for i, sid in enumerate(sorted_sids_meta)}
    n_epochs_meta = len(records)
    n_sids_meta = len(sorted_sids_meta)

    sids_for_sv: dict[str, list[int]] = {}
    for sid in sorted_sids_meta:
        sv = sid_props_meta[sid]["sv"]
        sids_for_sv.setdefault(sv, []).append(sid_to_idx_meta[sid])

    theta_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    phi_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    rise_set_arr = np.full((n_epochs_meta, n_sids_meta), -1, dtype=np.int8)
    mp_corr_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    smoothing_corr_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )
    code_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    carr_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    lock_time_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    cum_loss_cont_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )
    car_mp_corr_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )
    cn0_highres_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )

    pdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    hdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    vdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    n_sv_arr = np.full(n_epochs_meta, -1, dtype=np.int16)
    h_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    v_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    pvt_mode_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
    mean_corr_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    cpu_load_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
    temp_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    rx_error_arr = np.full(n_epochs_meta, 0, dtype=np.int32)

    timestamps_meta: list[np.datetime64] = []

    for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
        timestamps_meta.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

        if dop is not None:
            try:
                pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        if pvt is not None:
            try:
                n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                raw_hacc = int(pvt["HAccuracy"])
                if raw_hacc != 65535:
                    h_acc_arr[t_idx] = raw_hacc * 0.01
                raw_vacc = int(pvt["VAccuracy"])
                if raw_vacc != 65535:
                    v_acc_arr[t_idx] = raw_vacc * 0.01
                pvt_mode_arr[t_idx] = int(pvt["Mode"])
                mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                if np.isnan(pdop_arr[t_idx]):
                    pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        if status is not None:
            try:
                cpu_load_arr[t_idx] = int(status["CPULoad"])
                raw_temp = int(status["Temperature"])
                if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                    temp_arr[t_idx] = float(raw_temp - 100)
                rx_error_arr[t_idx] = int(status["RxError"])
            except KeyError, TypeError, ValueError:
                pass

        for sat_info in satvis:
            try:
                svid_raw = int(sat_info["SVID"])
                sys_code, prn = decode_svid(svid_raw)
                sv = f"{sys_code}{prn:02d}"
                theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                phi_deg = int(sat_info["Azimuth"]) * 0.01
                rs = int(sat_info["RiseSet"])
                for s_idx in sids_for_sv.get(sv, []):
                    theta_arr[t_idx, s_idx] = theta_deg
                    phi_arr[t_idx, s_idx] = phi_deg
                    rise_set_arr[t_idx, s_idx] = rs
            except KeyError, TypeError, ValueError:
                pass

        for ch in extra:
            try:
                type_byte = int(ch["Type"])
                info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                sig_num = decode_signal_num(type_byte, info_byte)
                rx_ch = int(ch["RxChannel"])
                svid = obs_map.get((rx_ch, sig_num))
                if svid is None:
                    continue
                sig_def = SIGNAL_TABLE.get(sig_num)
                if sig_def is None:
                    continue
                sys_code2, prn2 = decode_svid(svid)
                sv2 = f"{sys_code2}{prn2:02d}"
                sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                s_idx = sid_to_idx_meta.get(sid)
                if s_idx is None:
                    continue
                mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                # SmoothingCorr: i2, scale 0.001 m/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                raw_sc = ch.get("SmoothingCorr")
                if raw_sc is not None:
                    smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                raw_cv = ch.get("CodeVar")
                # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cv is not None and int(raw_cv) != 65535:
                    code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                raw_rv = ch.get("CarrierVar")
                # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_rv is not None and int(raw_rv) != 65535:
                    carr_var_arr[t_idx, s_idx] = float(raw_rv)
                raw_lt = ch.get("LockTime")
                # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_lt is not None and int(raw_lt) != 65535:
                    lock_time_arr[t_idx, s_idx] = float(raw_lt)
                raw_clc = ch.get("CumLossCont")
                # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_clc is not None:
                    cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                raw_cmc = ch.get("CarMPCorr")
                # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cmc is not None:
                    car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                raw_misc = ch.get("Misc")
                # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_misc is not None:
                    cn0_hr = int(raw_misc) & 0x07
                    cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
            except KeyError, TypeError, ValueError:
                pass

    freq_center_meta = np.asarray(
        [sid_props_meta[s]["freq_center"] for s in sorted_sids_meta],
        dtype=np.float32,
    )
    freq_min_meta = np.asarray(
        [sid_props_meta[s]["freq_min"] for s in sorted_sids_meta], dtype=np.float32
    )
    freq_max_meta = np.asarray(
        [sid_props_meta[s]["freq_max"] for s in sorted_sids_meta], dtype=np.float32
    )

    coords_meta: dict[str, Any] = {
        "epoch": ("epoch", timestamps_meta, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            sorted_sids_meta, dims=["sid"], attrs=COORDS_METADATA["sid"]
        ),
        "sv": (
            "sid",
            [sid_props_meta[s]["sv"] for s in sorted_sids_meta],
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            [sid_props_meta[s]["system"] for s in sorted_sids_meta],
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            [sid_props_meta[s]["band"] for s in sorted_sids_meta],
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            [sid_props_meta[s]["code"] for s in sorted_sids_meta],
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center_meta, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min_meta, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max_meta, COORDS_METADATA["freq_max"]),
        "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
        "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
        "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
        "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
        "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
        "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
        "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
        "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
        "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
        "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
        "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
    }

    attrs_meta = self._build_attrs()

    meta_ds = xr.Dataset(
        data_vars={
            "broadcast_theta": (
                ["epoch", "sid"],
                np.deg2rad(theta_arr),
                _BROADCAST_THETA_ATTRS,
            ),
            "broadcast_phi": (
                ["epoch", "sid"],
                np.deg2rad(phi_arr),
                _BROADCAST_PHI_ATTRS,
            ),
            "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
            "mp_correction_m": (
                ["epoch", "sid"],
                mp_corr_arr,
                _MP_CORRECTION_ATTRS,
            ),
            "smoothing_corr_m": (
                ["epoch", "sid"],
                smoothing_corr_arr,
                _SMOOTHING_CORR_ATTRS,
            ),
            "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
            "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
            "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
            "cum_loss_cont": (
                ["epoch", "sid"],
                cum_loss_cont_arr,
                _CUM_LOSS_CONT_ATTRS,
            ),
            "car_mp_corr_cycles": (
                ["epoch", "sid"],
                car_mp_corr_arr,
                _CAR_MP_CORR_ATTRS,
            ),
            "cn0_highres_correction": (
                ["epoch", "sid"],
                cn0_highres_arr,
                _CN0_HIGHRES_CORRECTION_ATTRS,
            ),
        },
        coords=coords_meta,
        attrs=attrs_meta,
    )

    # Align meta_ds SID to obs_ds SID.
    # obs uses sid_props_obs (MeasEpoch); meta uses sid_props_meta (Type1/Type2)
    # — they can diverge.  Reindex fills missing SIDs with NaN.
    meta_ds = meta_ds.reindex(sid=obs_ds.sid, fill_value=np.nan)
    # rise_set is int8 with sentinel -1; NaN fill promotes to float — cast back.
    if meta_ds["rise_set"].dtype != np.int8:
        meta_ds["rise_set"] = meta_ds["rise_set"].fillna(-1).astype(np.int8)

    # Apply CN0HighRes correction from MeasExtra (Block 4000) to SNR.
    # CN0HighRes extends resolution from 0.25 to 0.03125 dB-Hz.
    # RefGuide-4.14.0, MeasExtra MeasExtraChannelSub.Misc bits 0-2, p.265.
    # Where MeasExtra was not logged the correction array is NaN → no-op.
    corr = meta_ds["cn0_highres_correction"].values  # (epoch, sid), NaN if absent
    snr_raw_values = obs_ds["SNR"].values.copy()  # preserve 0.25 dB-Hz original
    snr_corrected = snr_raw_values.copy()
    valid = ~np.isnan(snr_corrected) & ~np.isnan(corr)
    snr_corrected[valid] += corr[valid]
    snr_attrs = dict(obs_ds["SNR"].attrs)
    snr_attrs["comment"] = (
        snr_attrs.get("comment", "")
        + " CN0HighRes correction from MeasExtra (Block 4000, p.265) applied where"
        " available, improving resolution from 0.25 to 0.03125 dB-Hz."
    ).lstrip()
    obs_ds["SNR"] = xr.DataArray(
        snr_corrected,
        dims=["epoch", "sid"],
        coords=obs_ds["SNR"].coords,
        attrs=snr_attrs,
    )

    if store_raw_observables:
        # ------------------------------------------------------------------
        # Add "physically raw" observables: pre-correction versions of SNR,
        # pseudorange, and carrier phase.  NaN where MeasExtra was absent.
        # Gated by store_raw_observables (config: store_sbf_raw_observables).
        # ------------------------------------------------------------------

        # SNR_raw: 0.25 dB-Hz resolution, before CN0HighRes extension.
        obs_ds["SNR_raw"] = xr.DataArray(
            snr_raw_values,
            dims=["epoch", "sid"],
            coords=obs_ds["SNR"].coords,
            attrs=_SNR_RAW_ATTRS,
        )

        # Pseudorange_unsmoothed: Hatch-filter correction removed.
        smooth = meta_ds["smoothing_corr_m"].values
        pr_vals = obs_ds["Pseudorange"].values
        pr_unsmoothed = np.where(
            ~np.isnan(smooth), pr_vals + smooth, np.nan
        ).astype(np.float64)
        obs_ds["Pseudorange_unsmoothed"] = xr.DataArray(
            pr_unsmoothed,
            dims=["epoch", "sid"],
            coords=obs_ds["Pseudorange"].coords,
            attrs=_PSEUDORANGE_UNSMOOTHED_ATTRS,
        )

        # Pseudorange_raw: both Hatch-filter and multipath corrections removed.
        mp = meta_ds["mp_correction_m"].values
        available = ~np.isnan(smooth) & ~np.isnan(mp)
        pr_raw = np.where(available, pr_vals + smooth + mp, np.nan).astype(
            np.float64
        )
        obs_ds["Pseudorange_raw"] = xr.DataArray(
            pr_raw,
            dims=["epoch", "sid"],
            coords=obs_ds["Pseudorange"].coords,
            attrs=_PSEUDORANGE_RAW_ATTRS,
        )

        # Phase_raw: carrier multipath correction removed.
        car_mp = meta_ds["car_mp_corr_cycles"].values
        ph_vals = obs_ds["Phase"].values
        ph_raw = np.where(~np.isnan(car_mp), ph_vals + car_mp, np.nan).astype(
            np.float64
        )
        obs_ds["Phase_raw"] = xr.DataArray(
            ph_raw,
            dims=["epoch", "sid"],
            coords=obs_ds["Phase"].coords,
            attrs=_PHASE_RAW_ATTRS,
        )

    return obs_ds, {"sbf_obs": meta_ds}

`repr()` ¶

Return a short string representation.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py

def __repr__(self) -> str:
    """Return a short string representation."""
    return f"SbfReader(file='{self.fpath.name}', epochs={self.num_epochs})"

`DataDirMatcher` ¶

Match RINEX data directories for canopy and reference receivers.

Scans a root directory structure to find dates with RINEX files present in both canopy and reference receiver directories.

Parameters¶

root : Path Root directory containing receiver subdirectories reference_pattern : Path, optional Relative path pattern for reference receiver (default: "01_reference/01_GNSS/01_raw") canopy_pattern : Path, optional Relative path pattern for canopy receiver (default: "02_canopy/01_GNSS/01_raw")

Examples¶

from pathlib import Path matcher = DataDirMatcher( ... root=Path("/data/01_Rosalia"), ... reference_pattern=Path("01_reference/01_GNSS/01_raw"), ... canopy_pattern=Path("02_canopy/01_GNSS/01_raw") ... )

Iterate over matched directories¶

for matched_dirs in matcher: ... print(matched_dirs.yyyydoy) ... rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o")) ... print(f" Found {len(rinex_files)} RINEX files")

Get list of common dates¶

dates = matcher.get_common_dates() print(f"Found {len(dates)} dates with data")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

class DataDirMatcher:
    """Match RINEX data directories for canopy and reference receivers.

    Scans a root directory structure to find dates with RINEX files
    present in both canopy and reference receiver directories.

    Parameters
    ----------
    root : Path
        Root directory containing receiver subdirectories
    reference_pattern : Path, optional
        Relative path pattern for reference receiver
        (default: "01_reference/01_GNSS/01_raw")
    canopy_pattern : Path, optional
        Relative path pattern for canopy receiver
        (default: "02_canopy/01_GNSS/01_raw")

    Examples
    --------
    >>> from pathlib import Path
    >>> matcher = DataDirMatcher(
    ...     root=Path("/data/01_Rosalia"),
    ...     reference_pattern=Path("01_reference/01_GNSS/01_raw"),
    ...     canopy_pattern=Path("02_canopy/01_GNSS/01_raw")
    ... )
    >>>
    >>> # Iterate over matched directories
    >>> for matched_dirs in matcher:
    ...     print(matched_dirs.yyyydoy)
    ...     rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o"))
    ...     print(f"  Found {len(rinex_files)} RINEX files")

    >>> # Get list of common dates
    >>> dates = matcher.get_common_dates()
    >>> print(f"Found {len(dates)} dates with data")

    """

    def __init__(
        self,
        root: Path,
        reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
        canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
    ) -> None:
        """Initialize matcher with directory structure."""
        import warnings

        warnings.warn(
            "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.root = Path(root)
        self.reference_dir = self.root / reference_pattern
        self.canopy_dir = self.root / canopy_pattern

        # Validate directories exist
        self._validate_directory(self.root, "Root")
        self._validate_directory(self.reference_dir, "Reference")
        self._validate_directory(self.canopy_dir, "Canopy")

    def __iter__(self) -> Iterator[MatchedDirs]:
        """Iterate over matched directory pairs with RINEX files.

        Yields
        ------
        MatchedDirs
            Matched directories for each date.

        """
        for date_str in self.get_common_dates():
            yield MatchedDirs(
                canopy_data_dir=self.canopy_dir / date_str,
                reference_data_dir=self.reference_dir / date_str,
                yyyydoy=YYYYDOY.from_yydoy_str(date_str),
            )

    def get_common_dates(self) -> list[str]:
        """Get dates with RINEX files in both receivers.

        Uses parallel processing to check directories efficiently.

        Returns
        -------
        list[str]
            Sorted list of date strings (YYDDD format, e.g., "25001")
            that have RINEX files in both canopy and reference directories.

        """
        # Find dates with RINEX in each receiver
        ref_dates = self._get_dates_with_rinex(self.reference_dir)
        can_dates = self._get_dates_with_rinex(self.canopy_dir)

        # Find intersection
        common = ref_dates & can_dates
        common.discard("00000")  # Remove placeholder directories

        # Sort naturally (numerical order)
        return natsorted(common)

    def _get_dates_with_rinex(self, base_dir: Path) -> set[str]:
        """Find all date directories containing RINEX files.

        Uses parallel processing to check multiple directories at once.

        Parameters
        ----------
        base_dir : Path
            Base directory to search (e.g., canopy or reference root).

        Returns
        -------
        set[str]
            Set of date directory names that contain RINEX files.

        """
        # Get all subdirectories
        date_dirs = (d for d in base_dir.iterdir() if d.is_dir())

        # Check for RINEX files in parallel
        dates_with_rinex = set()

        with ThreadPoolExecutor() as executor:
            future_to_dir = {
                executor.submit(self._has_rinex_files, d): d for d in date_dirs
            }

            for future in as_completed(future_to_dir):
                directory = future_to_dir[future]
                if future.result():
                    dates_with_rinex.add(directory.name)

        return dates_with_rinex

    @staticmethod
    def _has_rinex_files(directory: Path) -> bool:
        """Check if directory contains RINEX observation files.

        Parameters
        ----------
        directory : Path
            Directory to check.

        Returns
        -------
        bool
            True if RINEX files found.

        """
        return _has_rinex_files(directory)

    def _validate_directory(self, path: Path, name: str) -> None:
        """Validate directory exists.

        Parameters
        ----------
        path : Path
            Directory to check.
        name : str
            Name for error message.

        Raises
        ------
        FileNotFoundError
            If directory doesn't exist.

        """
        if not path.exists():
            msg = f"{name} directory not found: {path}"
            raise FileNotFoundError(msg)

`init(root, reference_pattern=Path('01_reference/01_GNSS/01_raw'), canopy_pattern=Path('02_canopy/01_GNSS/01_raw'))` ¶

Initialize matcher with directory structure.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __init__(
    self,
    root: Path,
    reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
    canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
) -> None:
    """Initialize matcher with directory structure."""
    import warnings

    warnings.warn(
        "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.root = Path(root)
    self.reference_dir = self.root / reference_pattern
    self.canopy_dir = self.root / canopy_pattern

    # Validate directories exist
    self._validate_directory(self.root, "Root")
    self._validate_directory(self.reference_dir, "Reference")
    self._validate_directory(self.canopy_dir, "Canopy")

`iter()` ¶

Iterate over matched directory pairs with RINEX files.

Yields¶

MatchedDirs Matched directories for each date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __iter__(self) -> Iterator[MatchedDirs]:
    """Iterate over matched directory pairs with RINEX files.

    Yields
    ------
    MatchedDirs
        Matched directories for each date.

    """
    for date_str in self.get_common_dates():
        yield MatchedDirs(
            canopy_data_dir=self.canopy_dir / date_str,
            reference_data_dir=self.reference_dir / date_str,
            yyyydoy=YYYYDOY.from_yydoy_str(date_str),
        )

`get_common_dates()` ¶

Get dates with RINEX files in both receivers.

Uses parallel processing to check directories efficiently.

Returns¶

list[str] Sorted list of date strings (YYDDD format, e.g., "25001") that have RINEX files in both canopy and reference directories.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def get_common_dates(self) -> list[str]:
    """Get dates with RINEX files in both receivers.

    Uses parallel processing to check directories efficiently.

    Returns
    -------
    list[str]
        Sorted list of date strings (YYDDD format, e.g., "25001")
        that have RINEX files in both canopy and reference directories.

    """
    # Find dates with RINEX in each receiver
    ref_dates = self._get_dates_with_rinex(self.reference_dir)
    can_dates = self._get_dates_with_rinex(self.canopy_dir)

    # Find intersection
    common = ref_dates & can_dates
    common.discard("00000")  # Remove placeholder directories

    # Sort naturally (numerical order)
    return natsorted(common)

`PairDataDirMatcher` ¶

Match RINEX directories for receiver pairs across dates.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site. Requires a configuration dict specifying receiver locations and analysis pairs.

Parameters¶

base_dir : Path Root directory containing all receiver data receivers : dict Receiver configuration mapping receiver names to their directory paths. The directory value is the full relative path from base_dir to the raw RINEX data directory (before the {YYDOY} date folders). Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"}, "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}} analysis_pairs : dict Analysis pair configuration specifying which receivers to match Example: {"pair_01": {"canopy_receiver": "canopy_01", "reference_receiver": "reference_01"}}

Examples¶

receivers = { ... "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"}, ... "reference_01": {"directory": "01_reference/01_GNSS/01_raw"} ... } pairs = { ... "main_pair": { ... "canopy_receiver": "canopy_01", ... "reference_receiver": "reference_01" ... } ... }

matcher = PairDataDirMatcher( ... base_dir=Path("/data/01_Rosalia"), ... receivers=receivers, ... analysis_pairs=pairs ... )

for matched in matcher: ... print(f"{matched.yyyydoy}: {matched.pair_name}") ... print(f" Canopy: {matched.canopy_data_dir}") ... print(f" Reference: {matched.reference_data_dir}")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

class PairDataDirMatcher:
    """Match RINEX directories for receiver pairs across dates.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site. Requires a configuration dict
    specifying receiver locations and analysis pairs.

    Parameters
    ----------
    base_dir : Path
        Root directory containing all receiver data
    receivers : dict
        Receiver configuration mapping receiver names to their directory paths.
        The ``directory`` value is the full relative path from ``base_dir`` to the
        raw RINEX data directory (before the ``{YYDOY}`` date folders).
        Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"},
                  "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}}
    analysis_pairs : dict
        Analysis pair configuration specifying which receivers to match
        Example: {"pair_01": {"canopy_receiver": "canopy_01",
                               "reference_receiver": "reference_01"}}

    Examples
    --------
    >>> receivers = {
    ...     "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"},
    ...     "reference_01": {"directory": "01_reference/01_GNSS/01_raw"}
    ... }
    >>> pairs = {
    ...     "main_pair": {
    ...         "canopy_receiver": "canopy_01",
    ...         "reference_receiver": "reference_01"
    ...     }
    ... }
    >>>
    >>> matcher = PairDataDirMatcher(
    ...     base_dir=Path("/data/01_Rosalia"),
    ...     receivers=receivers,
    ...     analysis_pairs=pairs
    ... )
    >>>
    >>> for matched in matcher:
    ...     print(f"{matched.yyyydoy}: {matched.pair_name}")
    ...     print(f"  Canopy: {matched.canopy_data_dir}")
    ...     print(f"  Reference: {matched.reference_data_dir}")

    """

    def __init__(
        self,
        base_dir: Path,
        receivers: dict[str, dict[str, str]],
        analysis_pairs: dict[str, dict[str, str]],
    ) -> None:
        """Initialize pair matcher with receiver configuration."""
        import warnings

        warnings.warn(
            "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.base_dir = Path(base_dir)
        self.receivers = receivers
        self.analysis_pairs = analysis_pairs

        # Validate receivers have directory config
        self.receiver_dirs = self._build_receiver_dir_mapping()

    def _build_receiver_dir_mapping(self) -> dict[str, str]:
        """Map receiver names to their directory prefixes.

        Returns
        -------
        dict[str, str]
            Mapping of receiver name to directory path.

        Raises
        ------
        ValueError
            If receiver missing 'directory' in config.

        """
        mapping = {}
        for receiver_name, config in self.receivers.items():
            if "directory" not in config:
                msg = f"Receiver '{receiver_name}' missing 'directory' in config"
                raise ValueError(msg)
            mapping[receiver_name] = config["directory"]
        return mapping

    def _get_receiver_path(self, receiver_name: str, yyyydoy: YYYYDOY) -> Path:
        """Build full path to receiver data for a specific date.

        Parameters
        ----------
        receiver_name : str
            Receiver name (e.g., "canopy_01").
        yyyydoy : YYYYDOY
            Date object.

        Returns
        -------
        Path
            Full path to receiver's RINEX directory for the date.

        """
        receiver_dir = self.receiver_dirs[receiver_name]

        # Convert YYYYDDD to YYDDD format for directory name
        yyddd_str = yyyydoy.yydoy
        if yyddd_str is None:
            msg = f"Missing YYDDD representation for date {yyyydoy}"
            raise ValueError(msg)

        return self.base_dir / receiver_dir / yyddd_str

    def _get_all_dates(self) -> set[YYYYDOY]:
        """Find all dates that have data in any receiver directory.

        Returns
        -------
        set[YYYYDOY]
            Set of all dates with available data.

        """
        all_dates = set()

        for receiver_name in self.receivers:
            receiver_dir = self.receiver_dirs[receiver_name]
            receiver_base = self.base_dir / receiver_dir

            if not receiver_base.exists():
                continue

            # Find all date directories (format: YYDDD - 5 digits)
            for date_dir in receiver_base.iterdir():
                if not date_dir.is_dir():
                    continue

                # Check if directory name is 5 digits
                if len(date_dir.name) != DATE_DIR_LEN or not date_dir.name.isdigit():
                    continue

                # Skip placeholder directories
                if date_dir.name == "00000":
                    continue

                try:
                    yyyydoy = YYYYDOY.from_yydoy_str(date_dir.name)
                    all_dates.add(yyyydoy)
                except ValueError:
                    continue

        return all_dates

    def __iter__(self) -> Iterator[PairMatchedDirs]:
        """Iterate over all date/pair combinations with available data.

        Yields
        ------
        PairMatchedDirs
            Matched directories for a receiver pair on a specific date.

        """
        all_dates = sorted(self._get_all_dates())

        for yyyydoy in all_dates:
            # For each configured analysis pair
            for pair_name, pair_config in self.analysis_pairs.items():
                canopy_rx = pair_config["canopy_receiver"]
                reference_rx = pair_config["reference_receiver"]

                # Build paths for this pair
                canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
                reference_path = self._get_receiver_path(reference_rx, yyyydoy)

                # Check for RINEX files
                canopy_has_files = _has_rinex_files(canopy_path)
                reference_has_files = _has_rinex_files(reference_path)

                # Only yield if both directories exist and have data
                if canopy_has_files and reference_has_files:
                    yield PairMatchedDirs(
                        yyyydoy=yyyydoy,
                        pair_name=pair_name,
                        canopy_receiver=canopy_rx,
                        reference_receiver=reference_rx,
                        canopy_data_dir=canopy_path,
                        reference_data_dir=reference_path,
                    )

`init(base_dir, receivers, analysis_pairs)` ¶

Initialize pair matcher with receiver configuration.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __init__(
    self,
    base_dir: Path,
    receivers: dict[str, dict[str, str]],
    analysis_pairs: dict[str, dict[str, str]],
) -> None:
    """Initialize pair matcher with receiver configuration."""
    import warnings

    warnings.warn(
        "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.base_dir = Path(base_dir)
    self.receivers = receivers
    self.analysis_pairs = analysis_pairs

    # Validate receivers have directory config
    self.receiver_dirs = self._build_receiver_dir_mapping()

`iter()` ¶

Iterate over all date/pair combinations with available data.

Yields¶

PairMatchedDirs Matched directories for a receiver pair on a specific date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __iter__(self) -> Iterator[PairMatchedDirs]:
    """Iterate over all date/pair combinations with available data.

    Yields
    ------
    PairMatchedDirs
        Matched directories for a receiver pair on a specific date.

    """
    all_dates = sorted(self._get_all_dates())

    for yyyydoy in all_dates:
        # For each configured analysis pair
        for pair_name, pair_config in self.analysis_pairs.items():
            canopy_rx = pair_config["canopy_receiver"]
            reference_rx = pair_config["reference_receiver"]

            # Build paths for this pair
            canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
            reference_path = self._get_receiver_path(reference_rx, yyyydoy)

            # Check for RINEX files
            canopy_has_files = _has_rinex_files(canopy_path)
            reference_has_files = _has_rinex_files(reference_path)

            # Only yield if both directories exist and have data
            if canopy_has_files and reference_has_files:
                yield PairMatchedDirs(
                    yyyydoy=yyyydoy,
                    pair_name=pair_name,
                    canopy_receiver=canopy_rx,
                    reference_receiver=reference_rx,
                    canopy_data_dir=canopy_path,
                    reference_data_dir=reference_path,
                )

`MatchedDirs` `dataclass` ¶

Matched directory paths for canopy and reference receivers.

Immutable container representing a pair of directories containing RINEX data for the same date.

Parameters¶

canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference (open-sky) receiver RINEX directory. yyyydoy : YYYYDOY Date object for this matched pair.

Examples¶

from pathlib import Path from canvod.utils.tools import YYYYDOY

md = MatchedDirs( ... canopy_data_dir=Path("/data/02_canopy/25001"), ... reference_data_dir=Path("/data/01_reference/25001"), ... yyyydoy=YYYYDOY.from_str("2025001") ... ) md.yyyydoy.to_str() '2025001'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py

@dataclass(frozen=True)
class MatchedDirs:
    """Matched directory paths for canopy and reference receivers.

    Immutable container representing a pair of directories containing
    RINEX data for the same date.

    Parameters
    ----------
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference (open-sky) receiver RINEX directory.
    yyyydoy : YYYYDOY
        Date object for this matched pair.

    Examples
    --------
    >>> from pathlib import Path
    >>> from canvod.utils.tools import YYYYDOY
    >>>
    >>> md = MatchedDirs(
    ...     canopy_data_dir=Path("/data/02_canopy/25001"),
    ...     reference_data_dir=Path("/data/01_reference/25001"),
    ...     yyyydoy=YYYYDOY.from_str("2025001")
    ... )
    >>> md.yyyydoy.to_str()
    '2025001'

    """

    canopy_data_dir: Path
    reference_data_dir: Path
    yyyydoy: YYYYDOY

`PairMatchedDirs` `dataclass` ¶

Matched directories for a receiver pair on a specific date.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site.

Parameters¶

yyyydoy : YYYYDOY Date for this matched pair. pair_name : str Identifier for this receiver pair (e.g., "pair_01"). canopy_receiver : str Name of canopy receiver (e.g., "canopy_01"). reference_receiver : str Name of reference receiver (e.g., "reference_01"). canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference receiver RINEX directory.

Examples¶

pmd = PairMatchedDirs( ... yyyydoy=YYYYDOY.from_str("2025001"), ... pair_name="pair_01", ... canopy_receiver="canopy_01", ... reference_receiver="reference_01", ... canopy_data_dir=Path("/data/canopy_01/25001"), ... reference_data_dir=Path("/data/reference_01/25001") ... ) pmd.pair_name 'pair_01'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py

@dataclass
class PairMatchedDirs:
    """Matched directories for a receiver pair on a specific date.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site.

    Parameters
    ----------
    yyyydoy : YYYYDOY
        Date for this matched pair.
    pair_name : str
        Identifier for this receiver pair (e.g., "pair_01").
    canopy_receiver : str
        Name of canopy receiver (e.g., "canopy_01").
    reference_receiver : str
        Name of reference receiver (e.g., "reference_01").
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference receiver RINEX directory.

    Examples
    --------
    >>> pmd = PairMatchedDirs(
    ...     yyyydoy=YYYYDOY.from_str("2025001"),
    ...     pair_name="pair_01",
    ...     canopy_receiver="canopy_01",
    ...     reference_receiver="reference_01",
    ...     canopy_data_dir=Path("/data/canopy_01/25001"),
    ...     reference_data_dir=Path("/data/reference_01/25001")
    ... )
    >>> pmd.pair_name
    'pair_01'

    """

    yyyydoy: YYYYDOY
    pair_name: str
    canopy_receiver: str
    reference_receiver: str
    canopy_data_dir: Path
    reference_data_dir: Path

`validate_dataset(ds, required_vars=None)` ¶

Validate ds meets the GNSSDataReader output contract.

Collects all violations and raises a single ValueError listing every problem, rather than stopping at the first failure.

Parameters¶

ds : xr.Dataset Dataset to validate. required_vars : list of str, optional Data variables that must be present. Defaults to :data:DEFAULT_REQUIRED_VARS (["SNR"]).

Raises¶

ValueError If any contract violation is found.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_dataset(ds: xr.Dataset, required_vars: list[str] | None = None) -> None:
    """Validate *ds* meets the GNSSDataReader output contract.

    Collects **all** violations and raises a single ``ValueError`` listing
    every problem, rather than stopping at the first failure.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset to validate.
    required_vars : list of str, optional
        Data variables that must be present.  Defaults to
        :data:`DEFAULT_REQUIRED_VARS` (``["SNR"]``).

    Raises
    ------
    ValueError
        If any contract violation is found.
    """
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)

    errors: list[str] = []

    # -- dimensions --
    missing_dims = set(REQUIRED_DIMS) - set(ds.dims)
    if missing_dims:
        errors.append(f"Missing required dimensions: {missing_dims}")

    # -- coordinates --
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in ds.coords:
            errors.append(f"Missing required coordinate: {coord}")
            continue

        actual_dtype = str(ds[coord].dtype)
        if expected_dtype == "object":
            # Accept object (VariableLengthUTF8, stable Zarr V3) and numpy 2.0
            # StringDType (same stable type, different numpy representation).
            # Reject <U* (FixedLengthUTF32) — no stable Zarr V3 spec.
            is_valid_string = actual_dtype == "object" or actual_dtype.startswith(
                "StringDType"
            )
            if not is_valid_string:
                errors.append(
                    f"Coordinate {coord} has wrong dtype: "
                    f"expected string (object/StringDType), got {actual_dtype}"
                )
        elif expected_dtype not in actual_dtype:
            errors.append(
                f"Coordinate {coord} has wrong dtype: "
                f"expected {expected_dtype}, got {actual_dtype}"
            )

    # -- data variables --
    missing_vars = set(required_vars) - set(ds.data_vars)
    if missing_vars:
        errors.append(f"Missing required data variables: {missing_vars}")

    expected_var_dims = ("epoch", "sid")
    for var in ds.data_vars:
        if ds[var].dims != expected_var_dims:
            errors.append(
                f"Data variable {var} has wrong dimensions: "
                f"expected {expected_var_dims}, got {ds[var].dims}"
            )

    # -- attributes --
    missing_attrs = REQUIRED_ATTRS - set(ds.attrs.keys())
    if missing_attrs:
        errors.append(f"Missing required attributes: {missing_attrs}")

    if errors:
        raise ValueError(
            "Dataset validation failed:\n" + "\n".join(f"  - {e}" for e in errors)
        )

RINEX v3.04¶

RINEX v3.04 observation file reader.

Migrated from: gnssvodpy/rinexreader/rinex_reader.py

Changes from original: - Updated imports to use canvod.readers.gnss_specs - Added structured logging for LLM-friendly diagnostics - Removed IcechunkPreprocessor calls (TODO: move to canvod-store) - Preserved all other functionality

Classes: - Rnxv3Header: Parse RINEX v3 headers - Rnxv3Obs: Main reader class, converts RINEX to xarray Dataset

`Rnxv3Header` ¶

Bases: BaseModel

Enhanced RINEX v3 header following the original implementation logic.

Key changes from previous version: - date field is now datetime (like original) - Uses the original parsing logic for __get_pgm_runby_date

Notes¶

This is a Pydantic BaseModel configured with ConfigDict (frozen, validate_assignment, arbitrary_types_allowed, str_strip_whitespace). Prefer :meth:from_file for construction.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

class Rnxv3Header(BaseModel):
    """Enhanced RINEX v3 header following the original implementation logic.

    Key changes from previous version:
    - date field is now datetime (like original)
    - Uses the original parsing logic for __get_pgm_runby_date

    Notes
    -----
    This is a Pydantic `BaseModel` configured with `ConfigDict` (frozen,
    validate_assignment, arbitrary_types_allowed, str_strip_whitespace). Prefer
    :meth:`from_file` for construction.

    """

    model_config = ConfigDict(
        frozen=True,
        validate_assignment=True,
        arbitrary_types_allowed=True,
        str_strip_whitespace=True,
    )

    # Required fields
    fpath: Path
    version: float
    filetype: str
    rinextype: str
    systems: str
    pgm: str
    run_by: str
    date: datetime
    marker_name: str
    observer: str
    agency: str
    receiver_number: str
    receiver_type: str
    receiver_version: str
    antenna_number: str
    antenna_type: str
    approx_position: list[pint.Quantity]
    antenna_position: list[pint.Quantity]
    t0: dict[str, datetime]
    signal_strength_unit: pint.Unit | str
    obs_codes_per_system: dict[str, list[str]]

    # Optional fields with defaults
    comment: str | None = None
    marker_number: int | None = None
    marker_type: str | None = None
    glonass_cod: str | None = None
    glonass_phs: str | None = None
    glonass_bis: str | None = None
    glonass_slot_freq_dict: dict[str, int] = Field(default_factory=dict)
    leap_seconds: pint.Quantity | None = None
    system_phase_shift: dict[str, dict[str, float | None]] = Field(default_factory=dict)

    @field_validator("marker_number", mode="before")
    @classmethod
    def parse_marker_number(cls, v: object) -> int | None:
        """Convert empty strings to None, parse valid integers."""
        if v is None or (isinstance(v, str) and not v.strip()):
            return None
        try:
            if not isinstance(v, (str, int, float)):
                return None
            return int(v)
        except ValueError, TypeError:
            return None

    @classmethod
    def from_file(cls, fpath: Path) -> Self:
        """Create header from a RINEX file."""
        # External validation models handle file and version checks
        _ = RnxObsFileModel(fpath=fpath)

        try:
            header = gr.rinexheader(fpath)
        except (OSError, ValueError, TypeError) as e:
            msg = f"Failed to read RINEX header: {e}"
            raise ValueError(msg) from e

        cast(Any, RnxVersion3Model).version_must_be_3(header["version"])

        # Parse and create instance using original logic
        parsed_data = cls._parse_header_data(cast(dict[str, Any], header), fpath)
        return cls.model_validate(parsed_data)

    @staticmethod
    def _parse_header_data(
        header: dict[str, Any],
        fpath: Path,
    ) -> dict[str, Any]:
        """Parse raw header into structured data using original logic.

        Parameters
        ----------
        header : dict[str, Any]
            Raw header dictionary returned by `georinex`.
        fpath : Path
            Path to the RINEX file.

        Returns
        -------
        dict[str, Any]
            Parsed header data suitable for model validation.

        """
        data = {
            "fpath": fpath,
            "version": header.get("version", 3.0),
            "filetype": header.get("filetype", ""),
            "rinextype": header.get("rinextype", ""),
            "systems": header.get("systems", ""),
        }

        if "PGM / RUN BY / DATE" in header:
            pgm, run_by, date_dt = Rnxv3Header._get_pgm_runby_date(header)
            data.update(
                {
                    "pgm": pgm,
                    "run_by": run_by,
                    "date": date_dt,  # This is now a datetime object
                }
            )
        else:
            data.update(
                {
                    "pgm": "",
                    "run_by": "",
                    "date": datetime.now(UTC),  # Default to current time
                }
            )

        if "OBSERVER / AGENCY" in header:
            observer, agency = Rnxv3Header._get_observer_agency(header)
            data.update({"observer": observer, "agency": agency})
        else:
            data.update({"observer": "", "agency": ""})

        if "REC # / TYPE / VERS" in header:
            rec_num, rec_type, rec_version = Rnxv3Header._get_receiver_num_type_version(
                header
            )
            data.update(
                {
                    "receiver_number": rec_num,
                    "receiver_type": rec_type,
                    "receiver_version": rec_version,
                }
            )
        else:
            data.update(
                {"receiver_number": "", "receiver_type": "", "receiver_version": ""}
            )

        if "ANT # / TYPE" in header:
            ant_num, ant_type = Rnxv3Header._get_antenna_num_type(header)
            data.update({"antenna_number": ant_num, "antenna_type": ant_type})
        else:
            data.update({"antenna_number": "", "antenna_type": ""})

        # Parse positions with safe fallbacks
        pos_parts = header.get("APPROX POSITION XYZ", "0 0 0").split()
        delta_parts = header.get("ANTENNA: DELTA H/E/N", "0 0 0").split()

        def safe_float(s: str, default: float = 0.0) -> float:
            try:
                return float(s)
            except ValueError, TypeError:
                return default

        pos_y = (
            safe_float(pos_parts[1]) * UREG.meters
            if len(pos_parts) > 1
            else 0.0 * UREG.meters
        )
        pos_z = (
            safe_float(pos_parts[2]) * UREG.meters
            if len(pos_parts) > POSITION_PARTS_MIN
            else 0.0 * UREG.meters
        )
        ant_y = (
            safe_float(delta_parts[1]) * UREG.meters
            if len(delta_parts) > 1
            else 0.0 * UREG.meters
        )
        ant_z = (
            safe_float(delta_parts[2]) * UREG.meters
            if len(delta_parts) > DELTA_PARTS_MIN
            else 0.0 * UREG.meters
        )

        data.update(
            {
                "approx_position": [
                    safe_float(pos_parts[0]) * UREG.meters,
                    pos_y,
                    pos_z,
                ],
                "antenna_position": [
                    safe_float(delta_parts[0]) * UREG.meters,
                    ant_y,
                    ant_z,
                ],
            }
        )

        if "TIME OF FIRST OBS" in header:
            data["t0"] = Rnxv3Header._get_time_of_first_obs(header)
        else:
            now = datetime.now(UTC)
            data["t0"] = {
                "UTC": now if now.tzinfo is not None else now.replace(tzinfo=UTC),
                "GPS": now,
            }

        # Signal strength unit
        data["signal_strength_unit"] = Rnxv3Header._get_signal_strength_unit(header)

        # Basic fields
        data.update(
            {
                "comment": header.get("COMMENT"),
                "marker_name": header.get("MARKER NAME", "").strip(),
                "marker_number": header.get("MARKER NUMBER"),
                "marker_type": header.get("MARKER TYPE"),
                "obs_codes_per_system": header.get("fields", {}),
            }
        )

        # Optional GLONASS fields using original methods
        if "GLONASS COD/PHS/BIS" in header:
            cod, phs, bis = Rnxv3Header._get_glonass_cod_phs_bis(header)
            data.update({"glonass_cod": cod, "glonass_phs": phs, "glonass_bis": bis})

        if "GLONASS SLOT / FRQ #" in header:
            data["glonass_slot_freq_dict"] = Rnxv3Header._get_glonass_slot_freq_num(
                header
            )

        # Leap seconds
        if "LEAP SECONDS" in header:
            leap_parts = header["LEAP SECONDS"].split()
            if leap_parts and leap_parts[0].lstrip("-").isdigit():
                data["leap_seconds"] = int(leap_parts[0]) * UREG.seconds

        # System phase shift using original method
        if "SYS / PHASE SHIFT" in header:
            data["system_phase_shift"] = Rnxv3Header._get_sys_phase_shift(header)
        else:
            data["system_phase_shift"] = {}

        return data

    @staticmethod
    def _get_pgm_runby_date(
        header_dict: dict[str, Any],
    ) -> tuple[str, str, datetime]:
        """Parse ``PGM / RUN BY / DATE`` into program, run_by, and datetime.

        Based on the original __get_pgm_runby_date method.
        """
        header_value = header_dict.get("PGM / RUN BY / DATE", "")
        components = header_value.split()

        if not components:
            return "", "", datetime.now(UTC)

        pgm = components[0]
        run_by = components[1] if len(components) > PGM_RUNBY_MIN_COMPONENTS else ""

        # Original logic for extracting date components
        date = (
            [components[-3], components[-2], components[-1]]
            if len(components) > 1
            else None
        )

        if date:
            try:
                # Original parsing logic
                dt = datetime.strptime(
                    date[0] + date[1],
                    "%Y%m%d%H%M%S",
                )
                tz = pytz.timezone(date[2])  # e.g., "UTC"
                localized_date = tz.localize(dt)
                return pgm, run_by, localized_date
            except (ValueError, TypeError) as e:
                print(f"Warning: Could not parse date components {date}: {e}")
                return pgm, run_by, datetime.now(UTC)
        else:
            return pgm, run_by, datetime.now(UTC)

    @staticmethod
    def _get_observer_agency(header_dict: dict[str, Any]) -> tuple[str, str]:
        """Parse ``OBSERVER / AGENCY`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str]
            (observer, agency).

        """
        header_value = header_dict.get("OBSERVER / AGENCY", "")
        try:
            observer, agency = header_value.split(maxsplit=1)
            return observer, agency
        except ValueError:
            return "", ""

    @staticmethod
    def _get_receiver_num_type_version(
        header_dict: dict[str, Any],
    ) -> tuple[str, str, str]:
        """Parse ``REC # / TYPE / VERS`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str, str]
            (receiver_number, receiver_type, receiver_version).

        """
        header_value = header_dict.get("REC # / TYPE / VERS", "")
        components = header_value.split()

        if not components:
            return "", "", ""
        if len(components) == 1:
            return components[0], "", ""
        if len(components) == RECEIVER_COMPONENTS_SECOND:
            return components[0], components[1], ""
        return components[0], " ".join(components[1:-1]), components[-1]

    @staticmethod
    def _get_antenna_num_type(header_dict: dict[str, Any]) -> tuple[str, str]:
        """Parse ``ANT # / TYPE`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str]
            (antenna_number, antenna_type).

        """
        header_value = header_dict.get("ANT # / TYPE", "")
        components = header_value.split()

        if not components:
            return "", ""
        if len(components) == 1:
            return components[0], ""
        return components[0], " ".join(components[1:])

    @staticmethod
    def _get_time_of_first_obs(
        header_dict: dict[str, Any],
    ) -> dict[str, datetime]:
        """Parse ``TIME OF FIRST OBS`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        dict[str, datetime]
            Mapping of time system labels to datetimes.

        """
        header_value = header_dict.get("TIME OF FIRST OBS", "")
        components = header_value.split()

        if len(components) < TIME_OF_FIRST_OBS_MIN_COMPONENTS:
            now = datetime.now(UTC)
            return {"UTC": now, "GPS": now}

        try:
            year, month, day = map(int, components[:3])
            hour, minute = map(int, components[3:5])
            second = float(components[5])

            dt_gps = datetime(
                year,
                month,
                day,
                hour,
                minute,
                int(second),
                int((second - int(second)) * 1e6),
                tzinfo=UTC,
            )

            gps_utc_offset = timedelta(seconds=18)
            dt_utc = dt_gps - gps_utc_offset
            tz = pytz.timezone("UTC")

            return {"UTC": tz.localize(dt_utc), "GPS": dt_gps}

        except ValueError, TypeError, IndexError:
            now = datetime.now(UTC)
            return {"UTC": now, "GPS": now}

    @staticmethod
    def _get_glonass_cod_phs_bis(
        header_dict: dict[str, Any],
    ) -> tuple[str, str, str]:
        """Parse ``GLONASS COD/PHS/BIS`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str, str]
            (glonass_cod, glonass_phs, glonass_bis).

        """
        header_value = header_dict.get("GLONASS COD/PHS/BIS", "")
        components = header_value.split()

        if len(components) >= GLONASS_COD_PHS_MIN_COMPONENTS:
            c1c = f"{components[0]} {components[1]}"
            c2c = f"{components[2]} {components[3]}"
            c2p = f"{components[4]} {components[5]}"
            return c1c, c2c, c2p
        return "", "", ""

    @staticmethod
    def _get_glonass_slot_freq_num(
        header_dict: dict[str, Any],
    ) -> dict[str, int]:
        """Parse ``GLONASS SLOT / FRQ #`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        dict[str, int]
            Mapping of slot to frequency number.

        """
        header_value = header_dict.get("GLONASS SLOT / FRQ #", "")
        components = header_value.split()

        result = {}
        for i in range(1, len(components), 2):  # Skip first component
            if i + 1 < len(components):
                try:
                    slot = components[i]
                    freq_num = int(components[i + 1])
                    result[slot] = freq_num
                except ValueError, IndexError:
                    continue

        return result

    @staticmethod
    def _get_sys_phase_shift(
        header_dict: dict[str, Any],
    ) -> dict[str, dict[str, float | None]]:
        """Parse ``SYS / PHASE SHIFT`` records.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        dict[str, dict[str, float | None]]
            Mapping of system to signal phase shifts.

        """
        header_value = header_dict.get("SYS / PHASE SHIFT", "")
        components = header_value.split()

        sys_phase_shift_dict = defaultdict(dict)
        i = 0

        while i < len(components):
            if i >= len(components):
                break

            system_abbrv = components[i]

            if i + 1 >= len(components):
                break
            signal_code = components[i + 1]

            # Check if there's a phase shift value
            phase_shift = None
            if (
                i + 2 < len(components)
                and components[i + 2].replace(".", "", 1).replace("-", "", 1).isdigit()
            ):
                try:
                    phase_shift = float(components[i + 2])
                    i += 3
                except ValueError, TypeError:
                    i += 2
            else:
                i += 2

            sys_phase_shift_dict[system_abbrv][signal_code] = phase_shift

        return {k: dict(v) for k, v in sys_phase_shift_dict.items()}

    @staticmethod
    def _get_signal_strength_unit(
        header_dict: dict[str, Any],
    ) -> pint.Unit | str:
        """Parse ``SIGNAL STRENGTH UNIT`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        pint.Unit or str
            Parsed unit or a default string.

        """
        header_value = header_dict.get("SIGNAL STRENGTH UNIT", "").strip()

        # Using match statement like original
        match header_value:
            case "DBHZ":
                return UREG.dBHz
            case "DB":
                return UREG.dB
            case _:
                return header_value if header_value else "dB"

    @property
    def is_mixed_systems(self) -> bool:
        """Check if the RINEX file contains mixed GNSS systems."""
        return self.systems == "M"

    def __repr__(self) -> str:
        """Return a concise representation for debugging."""
        return (
            f"Rnxv3Header(file='{self.fpath.name}', "
            f"version={self.version}, "
            f"systems='{self.systems}')"
        )

    def __str__(self) -> str:
        """Return a human-readable header summary."""
        systems_str = "Mixed" if self.systems == "M" else self.systems
        return (
            f"RINEX v{self.version} Header\n"
            f"  File: {self.fpath.name}\n"
            f"  Marker: {self.marker_name}\n"
            f"  Systems: {systems_str}\n"
            f"  Receiver: {self.receiver_type}\n"
            f"  Date: {self.date.strftime('%Y-%m-%d %H:%M:%S %Z')}\n"
        )

`is_mixed_systems` `property` ¶

Check if the RINEX file contains mixed GNSS systems.

`parse_marker_number(v)` `classmethod` ¶

Convert empty strings to None, parse valid integers.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

@field_validator("marker_number", mode="before")
@classmethod
def parse_marker_number(cls, v: object) -> int | None:
    """Convert empty strings to None, parse valid integers."""
    if v is None or (isinstance(v, str) and not v.strip()):
        return None
    try:
        if not isinstance(v, (str, int, float)):
            return None
        return int(v)
    except ValueError, TypeError:
        return None

`from_file(fpath)` `classmethod` ¶

Create header from a RINEX file.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

@classmethod
def from_file(cls, fpath: Path) -> Self:
    """Create header from a RINEX file."""
    # External validation models handle file and version checks
    _ = RnxObsFileModel(fpath=fpath)

    try:
        header = gr.rinexheader(fpath)
    except (OSError, ValueError, TypeError) as e:
        msg = f"Failed to read RINEX header: {e}"
        raise ValueError(msg) from e

    cast(Any, RnxVersion3Model).version_must_be_3(header["version"])

    # Parse and create instance using original logic
    parsed_data = cls._parse_header_data(cast(dict[str, Any], header), fpath)
    return cls.model_validate(parsed_data)

`repr()` ¶

Return a concise representation for debugging.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def __repr__(self) -> str:
    """Return a concise representation for debugging."""
    return (
        f"Rnxv3Header(file='{self.fpath.name}', "
        f"version={self.version}, "
        f"systems='{self.systems}')"
    )

`str()` ¶

Return a human-readable header summary.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def __str__(self) -> str:
    """Return a human-readable header summary."""
    systems_str = "Mixed" if self.systems == "M" else self.systems
    return (
        f"RINEX v{self.version} Header\n"
        f"  File: {self.fpath.name}\n"
        f"  Marker: {self.marker_name}\n"
        f"  Systems: {systems_str}\n"
        f"  Receiver: {self.receiver_type}\n"
        f"  Date: {self.date.strftime('%Y-%m-%d %H:%M:%S %Z')}\n"
    )

`Rnxv3Obs` ¶

Bases: GNSSDataReader

RINEX v3.04 observation reader.

Attributes¶

fpath : Path Path to the RINEX observation file. polarization : str, default "RHCP" Polarization label for observables. completeness_mode : {"strict", "warn", "off"}, default "strict" Behavior when epoch completeness checks fail. expected_dump_interval : str or pint.Quantity, optional Expected file dump interval for completeness validation. expected_sampling_interval : str or pint.Quantity, optional Expected sampling interval for completeness validation. apply_overlap_filter : bool, default False Whether to filter overlapping signal groups. overlap_preferences : dict[str, str], optional Preferred signals for overlap resolution. aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels.

Notes¶

Inherits fpath, its validator, and arbitrary_types_allowed from :class:GNSSDataReader.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

class Rnxv3Obs(GNSSDataReader):
    """RINEX v3.04 observation reader.

    Attributes
    ----------
    fpath : Path
        Path to the RINEX observation file.
    polarization : str, default "RHCP"
        Polarization label for observables.
    completeness_mode : {"strict", "warn", "off"}, default "strict"
        Behavior when epoch completeness checks fail.
    expected_dump_interval : str or pint.Quantity, optional
        Expected file dump interval for completeness validation.
    expected_sampling_interval : str or pint.Quantity, optional
        Expected sampling interval for completeness validation.
    apply_overlap_filter : bool, default False
        Whether to filter overlapping signal groups.
    overlap_preferences : dict[str, str], optional
        Preferred signals for overlap resolution.
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels.

    Notes
    -----
    Inherits ``fpath``, its validator, and ``arbitrary_types_allowed``
    from :class:`GNSSDataReader`.

    """

    model_config = ConfigDict(frozen=True)

    polarization: str = "RHCP"

    completeness_mode: Literal["strict", "warn", "off"] = "strict"
    expected_dump_interval: str | pint.Quantity | None = None
    expected_sampling_interval: str | pint.Quantity | None = None

    apply_overlap_filter: bool = False
    overlap_preferences: dict[str, str] | None = None

    aggregate_glonass_fdma: bool = True

    _header: Rnxv3Header = PrivateAttr()
    _signal_mapper: SignalIDMapper = PrivateAttr()

    _lines: list[str] = PrivateAttr()
    _file_hash: str = PrivateAttr()
    _cached_epoch_batches: list[tuple[int, int]] | None = PrivateAttr(default=None)

    @model_validator(mode="after")
    def _post_init(self) -> Self:
        """Initialize derived state after validation."""
        # Load header once
        self._header = Rnxv3Header.from_file(self.fpath)

        # Initialize signal mapper
        self._signal_mapper = SignalIDMapper(
            aggregate_glonass_fdma=self.aggregate_glonass_fdma
        )

        # Optionally auto-check completeness
        if self.completeness_mode != "off":
            try:
                self.validate_epoch_completeness(
                    dump_interval=self.expected_dump_interval,
                    sampling_interval=self.expected_sampling_interval,
                )
            except MissingEpochError as e:
                if self.completeness_mode == "strict":
                    raise
                warnings.warn(str(e), RuntimeWarning, stacklevel=2)

        # Cache file lines
        self._lines = self._load_file()

        return self

    @property
    def header(self) -> Rnxv3Header:
        """Expose validated header (read-only).

        Returns
        -------
        Rnxv3Header
            Parsed and validated RINEX header.

        """
        return self._header

    def __str__(self) -> str:
        """Return a human-readable summary."""
        return (
            f"{self.__class__.__name__}:\n"
            f"  File Path: {self.fpath}\n"
            f"  Header: {self.header}\n"
            f"  Polarization: {self.polarization}\n"
        )

    def __repr__(self) -> str:
        """Return a concise representation for debugging."""
        return f"{self.__class__.__name__}(fpath={self.fpath})"

    def _load_file(self) -> list[str]:
        """Read file once, cache lines, and compute hash.

        Returns
        -------
        list[str]
            File contents split into lines.

        """
        if not hasattr(self, "_lines"):
            h = hashlib.sha256()
            with self.fpath.open("rb") as f:  # binary mode for consistent hash
                data = f.read()
                h.update(data)
                self._lines = data.decode("utf-8", errors="replace").splitlines()
            self._file_hash = h.hexdigest()[:16]  # short hash for storage
        return self._lines

    @property
    def file_hash(self) -> str:
        """Return cached SHA256 short hash of the file content.

        Returns
        -------
        str
            16-character short hash for deduplication.

        """
        return self._file_hash

    @property
    def start_time(self) -> datetime:
        """Return start time of observations from header.

        Returns
        -------
        datetime
            First observation timestamp.

        """
        return min(self.header.t0.values())

    @property
    def end_time(self) -> datetime:
        """Return end time of observations from last epoch.

        Returns
        -------
        datetime
            Last observation timestamp.

        """
        last_epoch = None
        for epoch in self.iter_epochs():
            last_epoch = epoch
        if last_epoch:
            return self.get_datetime_from_epoch_record_info(last_epoch.info)
        return self.start_time

    @property
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers (G, R, E, C, J, S, I).

        """
        if self.header.systems == "M":
            return list(self.header.obs_codes_per_system.keys())
        return [self.header.systems]

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Returns
        -------
        int
            Total epoch count.

        """
        return len(list(self.get_epoch_record_batches()))

    @property
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.

        """
        satellites = set()
        for epoch in self.iter_epochs():
            for sat in epoch.data:
                satellites.add(sat.sv)
        return len(satellites)

    def get_epoch_record_batches(
        self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
    ) -> list[tuple[int, int]]:
        """Get the start and end line numbers for each epoch in the file.

        Parameters
        ----------
        epoch_record_indicator : str, default '>'
            Character marking epoch record lines.

        Returns
        -------
        list of tuple of int
            List of (start_line, end_line) pairs for each epoch.

        """
        if self._cached_epoch_batches is not None:
            return self._cached_epoch_batches

        lines = self._load_file()
        starts = [
            i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
        ]
        starts.append(len(lines))  # Add EOF
        self._cached_epoch_batches = [
            (start, starts[i + 1])
            for i, start in enumerate(starts)
            if i + 1 < len(starts)
        ]
        return self._cached_epoch_batches

    def parse_observation_slice(
        self,
        slice_text: str,
    ) -> tuple[float | None, int | None, int | None]:
        """Parse a RINEX observation slice into value, LLI, and SSI.

        Enhanced to handle both standard 16-character format and
        variable-length records.

        Parameters
        ----------
        slice_text : str
            Observation slice to parse.

        Returns
        -------
        tuple[float | None, int | None, int | None]
            Parsed (value, LLI, SSI) tuple.

        """
        if not slice_text or not slice_text.strip():
            return None, None, None

        try:
            # Method 1: Standard RINEX format with decimal at position -6
            if (
                len(slice_text) >= OBS_SLICE_MIN_LEN
                and len(slice_text) <= OBS_SLICE_MAX_LEN
                and slice_text[OBS_SLICE_DECIMAL_POS] == "."
            ):
                slice_chars = list(slice_text)
                ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
                lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

                # Convert LLI and SSI
                lli = int(lli) if lli.strip() and lli.isdigit() else None
                ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

                # Convert value
                value_str = "".join(slice_chars).strip()
                if value_str:
                    value = float(value_str)
                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        try:
            # Method 2: Flexible parsing for variable-length records
            slice_trimmed = slice_text.strip()
            if not slice_trimmed:
                return None, None, None

            # Look for a decimal point to identify the numeric value
            if "." in slice_trimmed:
                # Find the main numeric value (supports negative numbers)
                number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

                if number_match:
                    value = float(number_match.group(1))

                    # Check for LLI/SSI indicators after the number
                    remaining_part = slice_trimmed[number_match.end() :].strip()
                    lli = None
                    ssi = None

                    # Parse remaining characters as potential LLI/SSI
                    if remaining_part:
                        # Could be just SSI, or LLI followed by SSI
                        if len(remaining_part) == 1:
                            # Just one indicator - assume it's SSI
                            if remaining_part.isdigit():
                                ssi = int(remaining_part)
                        elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                            # Two or more characters - take last two as LLI, SSI
                            lli_char = remaining_part[-2]
                            ssi_char = remaining_part[-1]

                            if lli_char.isdigit():
                                lli = int(lli_char)
                            if ssi_char.isdigit():
                                ssi = int(ssi_char)

                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        # Method 3: Last resort - try simple float parsing
        try:
            simple_value = float(slice_text.strip())
            return simple_value, None, None
        except ValueError:
            pass

        return None, None, None

    def process_satellite_data(self, s: str) -> Satellite:
        """Process satellite data line into a Satellite object with observations.

        Handles variable-length observation records correctly by adaptively parsing
        based on the actual line length and content.
        """
        sv = s[:3].strip()
        satellite = Satellite(sv=sv)
        bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

        # Get the data part (after sv identifier)
        data_part = s[3:]

        # Process each observation adaptively
        for i, band in enumerate(bands_tbe):
            start_idx = i * 16
            end_idx = start_idx + 16

            # Check if we have enough data for this observation
            if start_idx >= len(data_part):
                # No more data available - create empty observation
                observation = Observation(
                    obs_type=band.split("|")[1][0],
                    value=None,
                    lli=None,
                    ssi=None,
                )
                satellite.add_observation(observation)
                continue

            # Extract the slice, but handle variable length
            if end_idx <= len(data_part):
                # Full 16-character slice available
                slice_data = data_part[start_idx:end_idx]
            else:
                # Partial slice - pad with spaces to maintain consistency
                available_slice = data_part[start_idx:]
                slice_data = available_slice.ljust(16)  # Pad with spaces if needed

            value, lli, ssi = self.parse_observation_slice(slice_data)

            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=value,
                lli=lli,
                ssi=ssi,
            )
            satellite.add_observation(observation)

        return satellite

    @property
    def epochs(self) -> list[Rnxv3ObsEpochRecord]:
        """Materialize all epochs (legacy compatibility).

        Returns
        -------
        list of Rnxv3ObsEpochRecord
            All epochs in memory (use iter_epochs for efficiency)

        """
        return list(self.iter_epochs())

    def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
        """Yield epochs one by one instead of materializing the whole list.

        Returns
        -------
        Generator
            Generator yielding Rnxv3ObsEpochRecord objects

        Yields
        ------
        Rnxv3ObsEpochRecord
            Each epoch with timestamp and satellite observations

        """
        for start, end in self.get_epoch_record_batches():
            try:
                info = Rnxv3ObsEpochRecordLineModel.model_validate(
                    {"epoch": self._lines[start]}
                )

                # Skip event epochs (flag 2-6: special records, not observations)
                if info.epoch_flag > 1:
                    continue

                # Filter out blank/whitespace-only lines from data slice
                data = [line for line in self._lines[start + 1 : end] if line.strip()]
                epoch = Rnxv3ObsEpochRecord(
                    info=info,
                    data=[self.process_satellite_data(line) for line in data],
                )
                yield epoch
            except InvalidEpochError, IncompleteEpochError, ValueError:
                # Skip epochs with validation errors (invalid SV, malformed data,
                # pydantic ValidationError inherits from ValueError)
                pass

    def iter_epochs_in_range(
        self,
        start: datetime,
        end: datetime,
    ) -> Iterable[Rnxv3ObsEpochRecord]:
        """Yield epochs lazily that fall into the given datetime range.

        Parameters
        ----------
        start : datetime
            Start of time range (inclusive)
        end : datetime
            End of time range (inclusive)

        Returns
        -------
        Generator
            Generator yielding epochs in the specified range

        Yields
        ------
        Rnxv3ObsEpochRecord
            Epochs within the time range

        """
        for epoch in self.iter_epochs():
            dt = self.get_datetime_from_epoch_record_info(epoch.info)
            if start <= dt <= end:
                yield epoch

    def get_datetime_from_epoch_record_info(
        self,
        epoch_record_info: Rnxv3ObsEpochRecordLineModel,
    ) -> datetime:
        """Convert epoch record info to datetime object.

        Parameters
        ----------
        epoch_record_info : Rnxv3ObsEpochRecordLineModel
            Parsed epoch record line

        Returns
        -------
        datetime
            Timestamp from epoch record

        """
        return datetime(
            year=int(epoch_record_info.year),
            month=int(epoch_record_info.month),
            day=int(epoch_record_info.day),
            hour=int(epoch_record_info.hour),
            minute=int(epoch_record_info.minute),
            second=int(epoch_record_info.seconds),
            tzinfo=UTC,
        )

    @staticmethod
    def epochrecordinfo_dt_to_numpy_dt(
        epch: Rnxv3ObsEpochRecord,
    ) -> np.datetime64:
        """Convert Python datetime to numpy datetime64[ns].

        Parameters
        ----------
        epch : Rnxv3ObsEpochRecord
            Epoch record containing timestamp info

        Returns
        -------
        np.datetime64
            Numpy datetime64 with nanosecond precision

        """
        dt = datetime(
            year=int(epch.info.year),
            month=int(epch.info.month),
            day=int(epch.info.day),
            hour=int(epch.info.hour),
            minute=int(epch.info.minute),
            second=int(epch.info.seconds),
            tzinfo=UTC,
        )
        # np.datetime64 doesn't support timezone info, but datetime is already UTC
        # Convert to naive datetime (UTC) to avoid warning
        return np.datetime64(dt.replace(tzinfo=None), "ns")

    def _epoch_datetimes(self) -> list[datetime]:
        """Extract epoch datetimes from the file.

        Uses the same epoch parsing logic already implemented.
        """
        dts: list[datetime] = []

        for start, _end in self.get_epoch_record_batches():
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )
            dts.append(
                datetime(
                    year=int(info.year),
                    month=int(info.month),
                    day=int(info.day),
                    hour=int(info.hour),
                    minute=int(info.minute),
                    second=int(info.seconds),
                    tzinfo=UTC,
                )
            )
        return dts

    def infer_sampling_interval(self) -> pint.Quantity | None:
        """Infer sampling interval from consecutive epoch deltas.

        Returns
        -------
        pint.Quantity or None
            Sampling interval in seconds, or None if cannot be inferred

        """
        dts = self._epoch_datetimes()
        if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
            return None
        # Compute deltas
        deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
        if not deltas:
            return None
        # Pick the most common delta (robust to an occasional missing epoch)
        seconds = Counter(
            int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
        )
        if not seconds:
            return None
        mode_seconds, _ = seconds.most_common(1)[0]
        return (mode_seconds * UREG.second).to(UREG.seconds)

    def infer_dump_interval(
        self, sampling_interval: pint.Quantity | None = None
    ) -> pint.Quantity | None:
        """Infer the intended dump interval for the RINEX file.

        Parameters
        ----------
        sampling_interval : pint.Quantity, optional
            Known sampling interval. If provided, returns (#epochs * sampling_interval)

        Returns
        -------
        pint.Quantity or None
            Dump interval in seconds, or None if cannot be inferred

        """
        idx = self.get_epoch_record_batches()
        n_epochs = len(idx)
        if n_epochs == 0:
            return None

        if sampling_interval is not None:
            return (n_epochs * sampling_interval).to(UREG.seconds)

        # Fallback: time coverage inclusive (last - first) + typical step
        dts = self._epoch_datetimes()
        if len(dts) == 0:
            return None
        if len(dts) == 1:
            # single epoch: treat as 1 * unknown step (cannot infer)
            return None

        # Estimate step from data
        est_step = self.infer_sampling_interval()
        if est_step is None:
            return None

        # Inclusive coverage often equals (n_epochs - 1) * step; intended
        # dump interval is n_epochs * step.
        return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

    def validate_epoch_completeness(
        self,
        dump_interval: str | pint.Quantity | None = None,
        sampling_interval: str | pint.Quantity | None = None,
    ) -> None:
        """Validate that the number of epochs matches the expected dump interval.

        Parameters
        ----------
        dump_interval : str or pint.Quantity, optional
            Expected file dump interval. If None, inferred from epochs.
        sampling_interval : str or pint.Quantity, optional
            Expected sampling interval. If None, inferred from epochs.

        Returns
        -------
        None

        Raises
        ------
        MissingEpochError
            If total sampling time doesn't match dump interval
        ValueError
            If intervals cannot be inferred

        """
        # Normalize/Infer sampling interval
        if sampling_interval is None:
            inferred = self.infer_sampling_interval()
            if inferred is None:
                msg = "Could not infer sampling interval from epochs"
                raise ValueError(msg)
            sampling_interval = inferred
        # normalize to pint
        elif not isinstance(sampling_interval, pint.Quantity):
            sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

        # Normalize/Infer dump interval
        if dump_interval is None:
            inferred_dump = self.infer_dump_interval(
                sampling_interval=sampling_interval
            )
            if inferred_dump is None:
                msg = "Could not infer dump interval from file"
                raise ValueError(msg)
            dump_interval = inferred_dump
        elif not isinstance(dump_interval, pint.Quantity):
            # Accept '15 min', '1h', etc.
            dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

        # Build inputs for the validator model
        epoch_indices = self.get_epoch_record_batches()

        # This throws MissingEpochError automatically if inconsistent
        cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
            epoch_records_indeces=epoch_indices,
            rnx_file_dump_interval=dump_interval,
            sampling_interval=sampling_interval,
        )

    def filter_by_overlapping_groups(
        self,
        ds: xr.Dataset,
        group_preference: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Filter overlapping bands using per-group preferences.

        Parameters
        ----------
        ds : xr.Dataset
            Dataset with `sid` dimension and signal properties.
        group_preference : dict[str, str], optional
            Mapping of overlap group to preferred band.

        Returns
        -------
        xr.Dataset
            Dataset filtered to preferred overlapping bands.

        """
        if group_preference is None:
            group_preference = {
                "L1_E1_B1I": "L1",
                "L5_E5a": "L5",
                "L2_E5b_B2b": "L2",
            }

        keep = []
        for sid in ds.sid.values:
            parts = str(sid).split("|")
            band = parts[1] if len(parts) >= 2 else ""
            group = self._signal_mapper.get_overlapping_group(band)
            if group and group in group_preference:
                if band == group_preference[group]:
                    keep.append(sid)
            else:
                keep.append(sid)
        return ds.sel(sid=keep)

    def _precompute_sids_from_header(
        self,
    ) -> tuple[list[str], dict[str, dict[str, object]]]:
        """Build sorted SID list and properties from header info alone.

        Uses the header's obs_codes_per_system and static constellation
        SV lists to pre-compute the full theoretical SID set, eliminating
        the discovery pass.

        Returns
        -------
        sorted_sids : list[str]
            Sorted list of signal IDs.
        sid_properties : dict[str, dict[str, object]]
            Mapping of SID to its properties (sv, system, band, code,
            freq_center, freq_min, freq_max, bandwidth, overlapping_group).

        """
        mapper = self._signal_mapper
        signal_ids: set[str] = set()
        sid_properties: dict[str, dict[str, object]] = {}

        # Pre-compute pint arithmetic once per unique band
        band_freq_cache: dict[str, tuple[float, float, float, float]] = {}

        for system, obs_codes in self.header.obs_codes_per_system.items():
            svs = _get_constellation_svs(system)

            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]

                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )

                # Cache frequency arithmetic per band
                if band_name not in band_freq_cache:
                    center_frequency = mapper.get_band_frequency(band_name)
                    bandwidth = mapper.get_band_bandwidth(band_name)

                    if center_frequency is not None and bandwidth is not None:
                        bw = bandwidth[0] if isinstance(bandwidth, list) else bandwidth
                        freq_min = center_frequency - (bw / 2.0)
                        freq_max = center_frequency + (bw / 2.0)
                        band_freq_cache[band_name] = (
                            float(center_frequency),
                            float(freq_min),
                            float(freq_max),
                            float(bw),
                        )
                    else:
                        band_freq_cache[band_name] = (
                            np.nan,
                            np.nan,
                            np.nan,
                            np.nan,
                        )

                freq_center, freq_min, freq_max, bw = band_freq_cache[band_name]
                overlapping_group = mapper.get_overlapping_group(band_name)

                sid_suffix = "|" + band_name + "|" + code_char

                for sv in svs:
                    sid = sv + sid_suffix
                    if sid not in signal_ids:
                        signal_ids.add(sid)
                        sid_properties[sid] = {
                            "sv": sv,
                            "system": system,
                            "band": band_name,
                            "code": code_char,
                            "freq_center": freq_center,
                            "freq_min": freq_min,
                            "freq_max": freq_max,
                            "bandwidth": bw,
                            "overlapping_group": overlapping_group,
                        }

        sorted_sids = sorted(signal_ids)
        return sorted_sids, {s: sid_properties[s] for s in sorted_sids}

    def _create_dataset_single_pass(self) -> xr.Dataset:
        """Create xarray Dataset in a single pass over the file.

        Pre-allocates arrays using header-derived SID set and epoch count,
        then fills them by parsing observations inline without Pydantic
        models or function-call overhead.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and standard variables.

        """
        lines = self._load_file()
        epoch_batches = self.get_epoch_record_batches()
        n_epochs = len(epoch_batches)

        sorted_sids, sid_properties = self._precompute_sids_from_header()
        n_sids = len(sorted_sids)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}

        # Pre-allocate arrays
        timestamps = np.empty(n_epochs, dtype="datetime64[ns]")
        snr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pseudo = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        phase = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        doppler = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        lli = np.full((n_epochs, n_sids), -1, dtype=DTYPES["LLI"])
        ssi = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        # Build obs_code → (obs_type, sid_suffix) lookup per system
        mapper = self._signal_mapper
        system_obs_lut: dict[str, list[tuple[str, str]]] = {}
        for system, obs_codes in self.header.obs_codes_per_system.items():
            lut: list[tuple[str, str]] = []
            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    lut.append(("", ""))
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]
                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )
                obs_type = obs_code[0]
                lut.append((obs_type, "|" + band_name + "|" + code_char))
            system_obs_lut[system] = lut

        # Single pass over all epochs — skip unparseable epoch lines
        valid_mask = np.ones(n_epochs, dtype=bool)
        for t_idx, (start, end) in enumerate(epoch_batches):
            epoch_line = lines[start]

            # Inline epoch parsing (no Pydantic model)
            m = _EPOCH_RE.match(epoch_line)
            if m is None:
                valid_mask[t_idx] = False
                continue

            year, month, day = int(m[1]), int(m[2]), int(m[3])
            hour, minute = int(m[4]), int(m[5])
            seconds = float(m[6])
            sec_int = int(seconds)
            usec = int((seconds - sec_int) * 1_000_000)
            ts = np.datetime64(
                f"{year:04d}-{month:02d}-{day:02d}"
                f"T{hour:02d}:{minute:02d}:{sec_int:02d}",
                "ns",
            )
            ts += np.timedelta64(usec, "us")
            timestamps[t_idx] = ts

            # Parse satellite data lines inline
            for line_idx in range(start + 1, end):
                sat_line = lines[line_idx]
                if len(sat_line) < 3:
                    continue
                sv = sat_line[:3].strip()
                if not sv:
                    continue
                system = sv[0]
                lut_list = system_obs_lut.get(system)
                if lut_list is None:
                    continue

                data_part = sat_line[3:]
                data_part_len = len(data_part)

                for i, (obs_type, sid_suffix) in enumerate(lut_list):
                    if not obs_type:
                        continue

                    col_start = i * 16
                    if col_start >= data_part_len:
                        break

                    sid_key = sv + sid_suffix
                    s_idx = sid_to_idx.get(sid_key)
                    if s_idx is None:
                        continue

                    col_end = col_start + 16
                    slice_text = data_part[col_start:col_end]

                    value, obs_lli, obs_ssi = _parse_obs_fast(slice_text)
                    if value is None:
                        continue

                    if obs_type == "S":
                        if value != 0:
                            snr[t_idx, s_idx] = value
                    elif obs_type == "C":
                        pseudo[t_idx, s_idx] = value
                    elif obs_type == "L":
                        phase[t_idx, s_idx] = value
                    elif obs_type == "D":
                        doppler[t_idx, s_idx] = value

                    if obs_lli is not None:
                        lli[t_idx, s_idx] = obs_lli
                    if obs_ssi is not None:
                        ssi[t_idx, s_idx] = obs_ssi

        # Drop epochs that failed to parse
        if not valid_mask.all():
            timestamps = timestamps[valid_mask]
            snr = snr[valid_mask]
            pseudo = pseudo[valid_mask]
            phase = phase[valid_mask]
            doppler = doppler[valid_mask]
            lli = lli[valid_mask]
            ssi = ssi[valid_mask]

        # Build coordinate arrays from pre-computed properties
        sv_list = np.array(
            [sid_properties[sid]["sv"] for sid in sorted_sids], dtype=object
        )
        constellation_list = np.array(
            [sid_properties[sid]["system"] for sid in sorted_sids], dtype=object
        )
        band_list = np.array(
            [sid_properties[sid]["band"] for sid in sorted_sids], dtype=object
        )
        code_list = np.array(
            [sid_properties[sid]["code"] for sid in sorted_sids], dtype=object
        )
        freq_center_list = [sid_properties[sid]["freq_center"] for sid in sorted_sids]
        freq_min_list = [sid_properties[sid]["freq_min"] for sid in sorted_sids]
        freq_max_list = [sid_properties[sid]["freq_max"] for sid in sorted_sids]

        signal_id_coord = xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        )
        coords = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": signal_id_coord,
            "sv": ("sid", sv_list, COORDS_METADATA["sv"]),
            "system": ("sid", constellation_list, COORDS_METADATA["system"]),
            "band": ("sid", band_list, COORDS_METADATA["band"]),
            "code": ("sid", code_list, COORDS_METADATA["code"]),
            "freq_center": (
                "sid",
                np.asarray(freq_center_list, dtype=DTYPES["freq_center"]),
                COORDS_METADATA["freq_center"],
            ),
            "freq_min": (
                "sid",
                np.asarray(freq_min_list, dtype=DTYPES["freq_min"]),
                COORDS_METADATA["freq_min"],
            ),
            "freq_max": (
                "sid",
                np.asarray(freq_max_list, dtype=DTYPES["freq_max"]),
                COORDS_METADATA["freq_max"],
            ),
        }

        if self.header.signal_strength_unit == UREG.dBHz:
            snr_meta = CN0_METADATA
        else:
            snr_meta = SNR_METADATA

        ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr, snr_meta),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pseudo,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (
                    ["epoch", "sid"],
                    phase,
                    OBSERVABLES_METADATA["Phase"],
                ),
                "Doppler": (
                    ["epoch", "sid"],
                    doppler,
                    OBSERVABLES_METADATA["Doppler"],
                ),
                "LLI": (
                    ["epoch", "sid"],
                    lli,
                    OBSERVABLES_METADATA["LLI"],
                ),
                "SSI": (
                    ["epoch", "sid"],
                    ssi,
                    OBSERVABLES_METADATA["SSI"],
                ),
            },
            coords=coords,
            attrs={**self._build_attrs()},
        )

        if self.apply_overlap_filter:
            ds = self.filter_by_overlapping_groups(ds, self.overlap_preferences)

        return ds

    def create_rinex_netcdf_with_signal_id(
        self,
        start: datetime | None = None,
        end: datetime | None = None,
    ) -> xr.Dataset:
        """Create a NetCDF dataset with signal IDs.

        Always uses the fast single-pass path.  Optionally restricts to
        epochs within a datetime range via post-filtering.

        Parameters
        ----------
        start : datetime, optional
            Start of time range (inclusive).
        end : datetime, optional
            End of time range (inclusive).

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid).

        """
        ds = self._create_dataset_single_pass()

        if start or end:
            ds = ds.sel(epoch=slice(start, end))

        return ds

    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert RINEX observations to xarray.Dataset with signal ID structure.

        Parameters
        ----------
        outname : Path or str, optional
            If provided, saves dataset to this file path
        keep_data_vars : list of str or None, optional
            Data variables to include in dataset. Defaults to config value.
        write_global_attrs : bool, default False
            If True, adds comprehensive global attributes
        pad_global_sid : bool, default True
            If True, pads to global signal ID space
        strip_fillval : bool, default True
            If True, removes fill values
        add_future_datavars : bool, default True
            If True, adds placeholder variables for future data
        keep_sids : list of str or None, default None
            If provided, filters/pads dataset to these specific SIDs.
            If None and pad_global_sid=True, pads to all possible SIDs.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and requested data variables

        """
        outname = cast(Path | str | None, kwargs.pop("outname", None))
        write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
        pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
        strip_fillval = bool(kwargs.pop("strip_fillval", True))
        add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
        keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

        if keep_data_vars is None:
            from canvod.utils.config import load_config

            keep_data_vars = load_config().processing.processing.keep_rnx_vars

        ds = self.create_rinex_netcdf_with_signal_id()

        # drop unwanted vars
        for var in list(ds.data_vars):
            if var not in keep_data_vars:
                ds = ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            # Pad/filter to specified sids or all possible sids
            ds = pad_to_global_sid(ds, keep_sids=keep_sids)

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            ds = strip_fillvalue(ds)

        if add_future_datavars:
            pass

        if write_global_attrs:
            ds.attrs.update(self._create_comprehensive_attrs())

        ds.attrs.update(self._build_attrs())

        if outname:
            from canvod.utils.config import load_config as _load_config

            comp = _load_config().processing.compression
            encoding = {
                var: {"zlib": comp.zlib, "complevel": comp.complevel}
                for var in ds.data_vars
            }
            ds.to_netcdf(str(outname), encoding=encoding)

        # Validate output structure for pipeline compatibility
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

    def validate_rinex_304_compliance(
        self,
        ds: xr.Dataset | None = None,
        strict: bool = False,
        print_report: bool = True,
    ) -> dict[str, list[str]]:
        """Run enhanced RINEX 3.04 specification validation.

        Validates:
        1. System-specific observation codes
        2. GLONASS mandatory fields (slot/frequency, biases)
        3. Phase shift records (RINEX 3.01+)
        4. Observation value ranges

        Parameters
        ----------
        ds : xr.Dataset, optional
            Dataset to validate. If None, creates one from current file.
        strict : bool
            If True, raise ValueError on validation failures
        print_report : bool
            If True, print validation report to console

        Returns
        -------
        dict[str, list[str]]
            Validation results by category

        Examples
        --------
        >>> reader = Rnxv3Obs(fpath="station.24o")
        >>> results = reader.validate_rinex_304_compliance()
        >>> # Or validate a specific dataset
        >>> ds = reader.to_ds()
        >>> results = reader.validate_rinex_304_compliance(ds=ds)

        """
        if ds is None:
            ds = self.to_ds(write_global_attrs=False)

        # Prepare header dict for validators
        header_dict: dict[str, Any] = {
            "obs_codes_per_system": self.header.obs_codes_per_system,
        }

        # Add GLONASS-specific headers if available
        if hasattr(self.header, "glonass_slot_frq"):
            header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

        if hasattr(self.header, "glonass_cod_phs_bis"):
            header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

        if hasattr(self.header, "phase_shift"):
            header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

        # Run validation
        results = RINEX304ComplianceValidator.validate_all(
            ds=ds, header_dict=header_dict, strict=strict
        )

        if print_report:
            RINEX304ComplianceValidator.print_validation_report(results)

        return results

    def _create_comprehensive_attrs(self) -> dict[str, object]:
        attrs: dict[str, object] = {
            "File Path": str(self.fpath),
            "File Type": self.header.filetype,
            "RINEX Version": self.header.version,
            "RINEX Type": self.header.rinextype,
            "Observer": self.header.observer,
            "Agency": self.header.agency,
            "Date": self.header.date.isoformat(),
            "Marker Name": self.header.marker_name,
            "Marker Number": self.header.marker_number,
            "Marker Type": self.header.marker_type,
            "Approximate Position": (
                f"(X = {self.header.approx_position[0].magnitude} "
                f"{self.header.approx_position[0].units:~}, "
                f"Y = {self.header.approx_position[1].magnitude} "
                f"{self.header.approx_position[1].units:~}, "
                f"Z = {self.header.approx_position[2].magnitude} "
                f"{self.header.approx_position[2].units:~})"
            ),
            "Receiver Type": self.header.receiver_type,
            "Receiver Version": self.header.receiver_version,
            "Receiver Number": self.header.receiver_number,
            "Antenna Type": self.header.antenna_type,
            "Antenna Number": self.header.antenna_number,
            "Antenna Position": (
                f"(X = {self.header.antenna_position[0].magnitude} "
                f"{self.header.antenna_position[0].units:~}, "
                f"Y = {self.header.antenna_position[1].magnitude} "
                f"{self.header.antenna_position[1].units:~}, "
                f"Z = {self.header.antenna_position[2].magnitude} "
                f"{self.header.antenna_position[2].units:~})"
            ),
            "Program": self.header.pgm,
            "Run By": self.header.run_by,
            "Time of First Observation": json.dumps(
                {k: v.isoformat() for k, v in self.header.t0.items()}
            ),
            "GLONASS COD": self.header.glonass_cod,
            "GLONASS PHS": self.header.glonass_phs,
            "GLONASS BIS": self.header.glonass_bis,
            "GLONASS Slot Frequency Dict": json.dumps(
                self.header.glonass_slot_freq_dict
            ),
            "Leap Seconds": f"{self.header.leap_seconds:~}",
        }
        return attrs

`header` `property` ¶

Expose validated header (read-only).

Rnxv3Header Parsed and validated RINEX header.

`file_hash` `property` ¶

Return cached SHA256 short hash of the file content.

Returns¶

str 16-character short hash for deduplication.

`start_time` `property` ¶

Return start time of observations from header.

Returns¶

datetime First observation timestamp.

`end_time` `property` ¶

Return end time of observations from last epoch.

Returns¶

datetime Last observation timestamp.

`systems` `property` ¶

Return list of GNSS systems in file.

Returns¶

list of str System identifiers (G, R, E, C, J, S, I).

`num_epochs` `property` ¶

Return number of epochs in file.

Returns¶

int Total epoch count.

`num_satellites` `property` ¶

Return total number of unique satellites observed.

Returns¶

int Count of unique satellite vehicles across all systems.

`epochs` `property` ¶

Materialize all epochs (legacy compatibility).

Returns¶

list of Rnxv3ObsEpochRecord All epochs in memory (use iter_epochs for efficiency)

`str()` ¶

Return a human-readable summary.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def __str__(self) -> str:
    """Return a human-readable summary."""
    return (
        f"{self.__class__.__name__}:\n"
        f"  File Path: {self.fpath}\n"
        f"  Header: {self.header}\n"
        f"  Polarization: {self.polarization}\n"
    )

`repr()` ¶

Return a concise representation for debugging.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def __repr__(self) -> str:
    """Return a concise representation for debugging."""
    return f"{self.__class__.__name__}(fpath={self.fpath})"

`get_epoch_record_batches(epoch_record_indicator=EPOCH_RECORD_INDICATOR)` ¶

Get the start and end line numbers for each epoch in the file.

Parameters¶

epoch_record_indicator : str, default '>' Character marking epoch record lines.

Returns¶

list of tuple of int List of (start_line, end_line) pairs for each epoch.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def get_epoch_record_batches(
    self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
) -> list[tuple[int, int]]:
    """Get the start and end line numbers for each epoch in the file.

    Parameters
    ----------
    epoch_record_indicator : str, default '>'
        Character marking epoch record lines.

    Returns
    -------
    list of tuple of int
        List of (start_line, end_line) pairs for each epoch.

    """
    if self._cached_epoch_batches is not None:
        return self._cached_epoch_batches

    lines = self._load_file()
    starts = [
        i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
    ]
    starts.append(len(lines))  # Add EOF
    self._cached_epoch_batches = [
        (start, starts[i + 1])
        for i, start in enumerate(starts)
        if i + 1 < len(starts)
    ]
    return self._cached_epoch_batches

`parse_observation_slice(slice_text)` ¶

Parse a RINEX observation slice into value, LLI, and SSI.

Enhanced to handle both standard 16-character format and variable-length records.

Parameters¶

slice_text : str Observation slice to parse.

Returns¶

tuple[float | None, int | None, int | None] Parsed (value, LLI, SSI) tuple.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def parse_observation_slice(
    self,
    slice_text: str,
) -> tuple[float | None, int | None, int | None]:
    """Parse a RINEX observation slice into value, LLI, and SSI.

    Enhanced to handle both standard 16-character format and
    variable-length records.

    Parameters
    ----------
    slice_text : str
        Observation slice to parse.

    Returns
    -------
    tuple[float | None, int | None, int | None]
        Parsed (value, LLI, SSI) tuple.

    """
    if not slice_text or not slice_text.strip():
        return None, None, None

    try:
        # Method 1: Standard RINEX format with decimal at position -6
        if (
            len(slice_text) >= OBS_SLICE_MIN_LEN
            and len(slice_text) <= OBS_SLICE_MAX_LEN
            and slice_text[OBS_SLICE_DECIMAL_POS] == "."
        ):
            slice_chars = list(slice_text)
            ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
            lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

            # Convert LLI and SSI
            lli = int(lli) if lli.strip() and lli.isdigit() else None
            ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

            # Convert value
            value_str = "".join(slice_chars).strip()
            if value_str:
                value = float(value_str)
                return value, lli, ssi

    except ValueError, IndexError:
        pass

    try:
        # Method 2: Flexible parsing for variable-length records
        slice_trimmed = slice_text.strip()
        if not slice_trimmed:
            return None, None, None

        # Look for a decimal point to identify the numeric value
        if "." in slice_trimmed:
            # Find the main numeric value (supports negative numbers)
            number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

            if number_match:
                value = float(number_match.group(1))

                # Check for LLI/SSI indicators after the number
                remaining_part = slice_trimmed[number_match.end() :].strip()
                lli = None
                ssi = None

                # Parse remaining characters as potential LLI/SSI
                if remaining_part:
                    # Could be just SSI, or LLI followed by SSI
                    if len(remaining_part) == 1:
                        # Just one indicator - assume it's SSI
                        if remaining_part.isdigit():
                            ssi = int(remaining_part)
                    elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                        # Two or more characters - take last two as LLI, SSI
                        lli_char = remaining_part[-2]
                        ssi_char = remaining_part[-1]

                        if lli_char.isdigit():
                            lli = int(lli_char)
                        if ssi_char.isdigit():
                            ssi = int(ssi_char)

                return value, lli, ssi

    except ValueError, IndexError:
        pass

    # Method 3: Last resort - try simple float parsing
    try:
        simple_value = float(slice_text.strip())
        return simple_value, None, None
    except ValueError:
        pass

    return None, None, None

`process_satellite_data(s)` ¶

Process satellite data line into a Satellite object with observations.

Handles variable-length observation records correctly by adaptively parsing based on the actual line length and content.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def process_satellite_data(self, s: str) -> Satellite:
    """Process satellite data line into a Satellite object with observations.

    Handles variable-length observation records correctly by adaptively parsing
    based on the actual line length and content.
    """
    sv = s[:3].strip()
    satellite = Satellite(sv=sv)
    bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

    # Get the data part (after sv identifier)
    data_part = s[3:]

    # Process each observation adaptively
    for i, band in enumerate(bands_tbe):
        start_idx = i * 16
        end_idx = start_idx + 16

        # Check if we have enough data for this observation
        if start_idx >= len(data_part):
            # No more data available - create empty observation
            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=None,
                lli=None,
                ssi=None,
            )
            satellite.add_observation(observation)
            continue

        # Extract the slice, but handle variable length
        if end_idx <= len(data_part):
            # Full 16-character slice available
            slice_data = data_part[start_idx:end_idx]
        else:
            # Partial slice - pad with spaces to maintain consistency
            available_slice = data_part[start_idx:]
            slice_data = available_slice.ljust(16)  # Pad with spaces if needed

        value, lli, ssi = self.parse_observation_slice(slice_data)

        observation = Observation(
            obs_type=band.split("|")[1][0],
            value=value,
            lli=lli,
            ssi=ssi,
        )
        satellite.add_observation(observation)

    return satellite

`iter_epochs()` ¶

Yield epochs one by one instead of materializing the whole list.

Returns¶

Generator Generator yielding Rnxv3ObsEpochRecord objects

Yields¶

Rnxv3ObsEpochRecord Each epoch with timestamp and satellite observations

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
    """Yield epochs one by one instead of materializing the whole list.

    Returns
    -------
    Generator
        Generator yielding Rnxv3ObsEpochRecord objects

    Yields
    ------
    Rnxv3ObsEpochRecord
        Each epoch with timestamp and satellite observations

    """
    for start, end in self.get_epoch_record_batches():
        try:
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )

            # Skip event epochs (flag 2-6: special records, not observations)
            if info.epoch_flag > 1:
                continue

            # Filter out blank/whitespace-only lines from data slice
            data = [line for line in self._lines[start + 1 : end] if line.strip()]
            epoch = Rnxv3ObsEpochRecord(
                info=info,
                data=[self.process_satellite_data(line) for line in data],
            )
            yield epoch
        except InvalidEpochError, IncompleteEpochError, ValueError:
            # Skip epochs with validation errors (invalid SV, malformed data,
            # pydantic ValidationError inherits from ValueError)
            pass

`iter_epochs_in_range(start, end)` ¶

Yield epochs lazily that fall into the given datetime range.

Parameters¶

start : datetime Start of time range (inclusive) end : datetime End of time range (inclusive)

Returns¶

Generator Generator yielding epochs in the specified range

Yields¶

Rnxv3ObsEpochRecord Epochs within the time range

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def iter_epochs_in_range(
    self,
    start: datetime,
    end: datetime,
) -> Iterable[Rnxv3ObsEpochRecord]:
    """Yield epochs lazily that fall into the given datetime range.

    Parameters
    ----------
    start : datetime
        Start of time range (inclusive)
    end : datetime
        End of time range (inclusive)

    Returns
    -------
    Generator
        Generator yielding epochs in the specified range

    Yields
    ------
    Rnxv3ObsEpochRecord
        Epochs within the time range

    """
    for epoch in self.iter_epochs():
        dt = self.get_datetime_from_epoch_record_info(epoch.info)
        if start <= dt <= end:
            yield epoch

`get_datetime_from_epoch_record_info(epoch_record_info)` ¶

Convert epoch record info to datetime object.

Parameters¶

epoch_record_info : Rnxv3ObsEpochRecordLineModel Parsed epoch record line

Returns¶

datetime Timestamp from epoch record

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def get_datetime_from_epoch_record_info(
    self,
    epoch_record_info: Rnxv3ObsEpochRecordLineModel,
) -> datetime:
    """Convert epoch record info to datetime object.

    Parameters
    ----------
    epoch_record_info : Rnxv3ObsEpochRecordLineModel
        Parsed epoch record line

    Returns
    -------
    datetime
        Timestamp from epoch record

    """
    return datetime(
        year=int(epoch_record_info.year),
        month=int(epoch_record_info.month),
        day=int(epoch_record_info.day),
        hour=int(epoch_record_info.hour),
        minute=int(epoch_record_info.minute),
        second=int(epoch_record_info.seconds),
        tzinfo=UTC,
    )

`epochrecordinfo_dt_to_numpy_dt(epch)` `staticmethod` ¶

Convert Python datetime to numpy datetime64[ns].

Parameters¶

epch : Rnxv3ObsEpochRecord Epoch record containing timestamp info

Returns¶

np.datetime64 Numpy datetime64 with nanosecond precision

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

@staticmethod
def epochrecordinfo_dt_to_numpy_dt(
    epch: Rnxv3ObsEpochRecord,
) -> np.datetime64:
    """Convert Python datetime to numpy datetime64[ns].

    Parameters
    ----------
    epch : Rnxv3ObsEpochRecord
        Epoch record containing timestamp info

    Returns
    -------
    np.datetime64
        Numpy datetime64 with nanosecond precision

    """
    dt = datetime(
        year=int(epch.info.year),
        month=int(epch.info.month),
        day=int(epch.info.day),
        hour=int(epch.info.hour),
        minute=int(epch.info.minute),
        second=int(epch.info.seconds),
        tzinfo=UTC,
    )
    # np.datetime64 doesn't support timezone info, but datetime is already UTC
    # Convert to naive datetime (UTC) to avoid warning
    return np.datetime64(dt.replace(tzinfo=None), "ns")

`infer_sampling_interval()` ¶

Infer sampling interval from consecutive epoch deltas.

Returns¶

pint.Quantity or None Sampling interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def infer_sampling_interval(self) -> pint.Quantity | None:
    """Infer sampling interval from consecutive epoch deltas.

    Returns
    -------
    pint.Quantity or None
        Sampling interval in seconds, or None if cannot be inferred

    """
    dts = self._epoch_datetimes()
    if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
        return None
    # Compute deltas
    deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
    if not deltas:
        return None
    # Pick the most common delta (robust to an occasional missing epoch)
    seconds = Counter(
        int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
    )
    if not seconds:
        return None
    mode_seconds, _ = seconds.most_common(1)[0]
    return (mode_seconds * UREG.second).to(UREG.seconds)

`infer_dump_interval(sampling_interval=None)` ¶

Infer the intended dump interval for the RINEX file.

Parameters¶

sampling_interval : pint.Quantity, optional Known sampling interval. If provided, returns (#epochs * sampling_interval)

Returns¶

pint.Quantity or None Dump interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def infer_dump_interval(
    self, sampling_interval: pint.Quantity | None = None
) -> pint.Quantity | None:
    """Infer the intended dump interval for the RINEX file.

    Parameters
    ----------
    sampling_interval : pint.Quantity, optional
        Known sampling interval. If provided, returns (#epochs * sampling_interval)

    Returns
    -------
    pint.Quantity or None
        Dump interval in seconds, or None if cannot be inferred

    """
    idx = self.get_epoch_record_batches()
    n_epochs = len(idx)
    if n_epochs == 0:
        return None

    if sampling_interval is not None:
        return (n_epochs * sampling_interval).to(UREG.seconds)

    # Fallback: time coverage inclusive (last - first) + typical step
    dts = self._epoch_datetimes()
    if len(dts) == 0:
        return None
    if len(dts) == 1:
        # single epoch: treat as 1 * unknown step (cannot infer)
        return None

    # Estimate step from data
    est_step = self.infer_sampling_interval()
    if est_step is None:
        return None

    # Inclusive coverage often equals (n_epochs - 1) * step; intended
    # dump interval is n_epochs * step.
    return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

`validate_epoch_completeness(dump_interval=None, sampling_interval=None)` ¶

Validate that the number of epochs matches the expected dump interval.

Parameters¶

dump_interval : str or pint.Quantity, optional Expected file dump interval. If None, inferred from epochs. sampling_interval : str or pint.Quantity, optional Expected sampling interval. If None, inferred from epochs.

Returns¶

None

Raises¶

MissingEpochError If total sampling time doesn't match dump interval ValueError If intervals cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def validate_epoch_completeness(
    self,
    dump_interval: str | pint.Quantity | None = None,
    sampling_interval: str | pint.Quantity | None = None,
) -> None:
    """Validate that the number of epochs matches the expected dump interval.

    Parameters
    ----------
    dump_interval : str or pint.Quantity, optional
        Expected file dump interval. If None, inferred from epochs.
    sampling_interval : str or pint.Quantity, optional
        Expected sampling interval. If None, inferred from epochs.

    Returns
    -------
    None

    Raises
    ------
    MissingEpochError
        If total sampling time doesn't match dump interval
    ValueError
        If intervals cannot be inferred

    """
    # Normalize/Infer sampling interval
    if sampling_interval is None:
        inferred = self.infer_sampling_interval()
        if inferred is None:
            msg = "Could not infer sampling interval from epochs"
            raise ValueError(msg)
        sampling_interval = inferred
    # normalize to pint
    elif not isinstance(sampling_interval, pint.Quantity):
        sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

    # Normalize/Infer dump interval
    if dump_interval is None:
        inferred_dump = self.infer_dump_interval(
            sampling_interval=sampling_interval
        )
        if inferred_dump is None:
            msg = "Could not infer dump interval from file"
            raise ValueError(msg)
        dump_interval = inferred_dump
    elif not isinstance(dump_interval, pint.Quantity):
        # Accept '15 min', '1h', etc.
        dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

    # Build inputs for the validator model
    epoch_indices = self.get_epoch_record_batches()

    # This throws MissingEpochError automatically if inconsistent
    cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
        epoch_records_indeces=epoch_indices,
        rnx_file_dump_interval=dump_interval,
        sampling_interval=sampling_interval,
    )

`filter_by_overlapping_groups(ds, group_preference=None)` ¶

Filter overlapping bands using per-group preferences.

Parameters¶

ds : xr.Dataset Dataset with sid dimension and signal properties. group_preference : dict[str, str], optional Mapping of overlap group to preferred band.

Returns¶

xr.Dataset Dataset filtered to preferred overlapping bands.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def filter_by_overlapping_groups(
    self,
    ds: xr.Dataset,
    group_preference: dict[str, str] | None = None,
) -> xr.Dataset:
    """Filter overlapping bands using per-group preferences.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset with `sid` dimension and signal properties.
    group_preference : dict[str, str], optional
        Mapping of overlap group to preferred band.

    Returns
    -------
    xr.Dataset
        Dataset filtered to preferred overlapping bands.

    """
    if group_preference is None:
        group_preference = {
            "L1_E1_B1I": "L1",
            "L5_E5a": "L5",
            "L2_E5b_B2b": "L2",
        }

    keep = []
    for sid in ds.sid.values:
        parts = str(sid).split("|")
        band = parts[1] if len(parts) >= 2 else ""
        group = self._signal_mapper.get_overlapping_group(band)
        if group and group in group_preference:
            if band == group_preference[group]:
                keep.append(sid)
        else:
            keep.append(sid)
    return ds.sel(sid=keep)

`create_rinex_netcdf_with_signal_id(start=None, end=None)` ¶

Create a NetCDF dataset with signal IDs.

Always uses the fast single-pass path. Optionally restricts to epochs within a datetime range via post-filtering.

Parameters¶

start : datetime, optional Start of time range (inclusive). end : datetime, optional End of time range (inclusive).

Returns¶

xr.Dataset Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def create_rinex_netcdf_with_signal_id(
    self,
    start: datetime | None = None,
    end: datetime | None = None,
) -> xr.Dataset:
    """Create a NetCDF dataset with signal IDs.

    Always uses the fast single-pass path.  Optionally restricts to
    epochs within a datetime range via post-filtering.

    Parameters
    ----------
    start : datetime, optional
        Start of time range (inclusive).
    end : datetime, optional
        End of time range (inclusive).

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid).

    """
    ds = self._create_dataset_single_pass()

    if start or end:
        ds = ds.sel(epoch=slice(start, end))

    return ds

`to_ds(keep_data_vars=None, **kwargs)` ¶

Convert RINEX observations to xarray.Dataset with signal ID structure.

Parameters¶

outname : Path or str, optional If provided, saves dataset to this file path keep_data_vars : list of str or None, optional Data variables to include in dataset. Defaults to config value. write_global_attrs : bool, default False If True, adds comprehensive global attributes pad_global_sid : bool, default True If True, pads to global signal ID space strip_fillval : bool, default True If True, removes fill values add_future_datavars : bool, default True If True, adds placeholder variables for future data keep_sids : list of str or None, default None If provided, filters/pads dataset to these specific SIDs. If None and pad_global_sid=True, pads to all possible SIDs.

Returns¶

xr.Dataset Dataset with dimensions (epoch, sid) and requested data variables

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert RINEX observations to xarray.Dataset with signal ID structure.

    Parameters
    ----------
    outname : Path or str, optional
        If provided, saves dataset to this file path
    keep_data_vars : list of str or None, optional
        Data variables to include in dataset. Defaults to config value.
    write_global_attrs : bool, default False
        If True, adds comprehensive global attributes
    pad_global_sid : bool, default True
        If True, pads to global signal ID space
    strip_fillval : bool, default True
        If True, removes fill values
    add_future_datavars : bool, default True
        If True, adds placeholder variables for future data
    keep_sids : list of str or None, default None
        If provided, filters/pads dataset to these specific SIDs.
        If None and pad_global_sid=True, pads to all possible SIDs.

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid) and requested data variables

    """
    outname = cast(Path | str | None, kwargs.pop("outname", None))
    write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
    pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
    strip_fillval = bool(kwargs.pop("strip_fillval", True))
    add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
    keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

    if keep_data_vars is None:
        from canvod.utils.config import load_config

        keep_data_vars = load_config().processing.processing.keep_rnx_vars

    ds = self.create_rinex_netcdf_with_signal_id()

    # drop unwanted vars
    for var in list(ds.data_vars):
        if var not in keep_data_vars:
            ds = ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        # Pad/filter to specified sids or all possible sids
        ds = pad_to_global_sid(ds, keep_sids=keep_sids)

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        ds = strip_fillvalue(ds)

    if add_future_datavars:
        pass

    if write_global_attrs:
        ds.attrs.update(self._create_comprehensive_attrs())

    ds.attrs.update(self._build_attrs())

    if outname:
        from canvod.utils.config import load_config as _load_config

        comp = _load_config().processing.compression
        encoding = {
            var: {"zlib": comp.zlib, "complevel": comp.complevel}
            for var in ds.data_vars
        }
        ds.to_netcdf(str(outname), encoding=encoding)

    # Validate output structure for pipeline compatibility
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

`validate_rinex_304_compliance(ds=None, strict=False, print_report=True)` ¶

Run enhanced RINEX 3.04 specification validation.

Validates: 1. System-specific observation codes 2. GLONASS mandatory fields (slot/frequency, biases) 3. Phase shift records (RINEX 3.01+) 4. Observation value ranges

Parameters¶

ds : xr.Dataset, optional Dataset to validate. If None, creates one from current file. strict : bool If True, raise ValueError on validation failures print_report : bool If True, print validation report to console

Returns¶

dict[str, list[str]] Validation results by category

Examples¶

reader = Rnxv3Obs(fpath="station.24o") results = reader.validate_rinex_304_compliance()

Or validate a specific dataset¶

ds = reader.to_ds() results = reader.validate_rinex_304_compliance(ds=ds)

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def validate_rinex_304_compliance(
    self,
    ds: xr.Dataset | None = None,
    strict: bool = False,
    print_report: bool = True,
) -> dict[str, list[str]]:
    """Run enhanced RINEX 3.04 specification validation.

    Validates:
    1. System-specific observation codes
    2. GLONASS mandatory fields (slot/frequency, biases)
    3. Phase shift records (RINEX 3.01+)
    4. Observation value ranges

    Parameters
    ----------
    ds : xr.Dataset, optional
        Dataset to validate. If None, creates one from current file.
    strict : bool
        If True, raise ValueError on validation failures
    print_report : bool
        If True, print validation report to console

    Returns
    -------
    dict[str, list[str]]
        Validation results by category

    Examples
    --------
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> results = reader.validate_rinex_304_compliance()
    >>> # Or validate a specific dataset
    >>> ds = reader.to_ds()
    >>> results = reader.validate_rinex_304_compliance(ds=ds)

    """
    if ds is None:
        ds = self.to_ds(write_global_attrs=False)

    # Prepare header dict for validators
    header_dict: dict[str, Any] = {
        "obs_codes_per_system": self.header.obs_codes_per_system,
    }

    # Add GLONASS-specific headers if available
    if hasattr(self.header, "glonass_slot_frq"):
        header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

    if hasattr(self.header, "glonass_cod_phs_bis"):
        header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

    if hasattr(self.header, "phase_shift"):
        header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

    # Run validation
    results = RINEX304ComplianceValidator.validate_all(
        ds=ds, header_dict=header_dict, strict=strict
    )

    if print_report:
        RINEX304ComplianceValidator.print_validation_report(results)

    return results

`adapt_existing_rnxv3obs_class(original_class_path=None)` ¶

Provide guidance to integrate the enhanced sid functionality.

This function provides guidance on how to modify the existing class to support the new sid structure alongside the current OFT structure.

Returns¶

str Integration instructions

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py

def adapt_existing_rnxv3obs_class(original_class_path: str | None = None) -> str:
    """Provide guidance to integrate the enhanced sid functionality.

    This function provides guidance on how to modify the existing class
    to support the new sid structure alongside the current OFT structure.

    Returns
    -------
    str
        Integration instructions

    """
    _ = original_class_path
    return """
    INTEGRATION GUIDE: Adapting Rnxv3Obs for sid Structure
    ============================================================

    To integrate the new sid functionality into your existing Rnxv3Obs class:

    1. ADD THE SIGNAL_ID_MAPPER CLASS:
       - Copy the SignalIDMapper class to your rinex_reader.py file
       - This handles the mapping logic and band properties

    2. ADD NEW METHODS TO Rnxv3Obs CLASS:

       Method: create_rinex_netcdf_with_signal_id()
       - Copy from EnhancedRnxv3Obs.create_rinex_netcdf_with_signal_id()
       - This creates the new sid-based structure

       Method: filter_by_overlapping_groups()
       - Copy from EnhancedRnxv3Obs.filter_by_overlapping_groups()
       - Handles overlapping signal filtering (Problem A solution)

       Method: to_ds()
       - Copy from EnhancedRnxv3Obs.to_ds()
       - Main interface for creating sid datasets

       Method: create_legacy_compatible_dataset()
       - Copy from EnhancedRnxv3Obs.create_legacy_compatible_dataset()
       - Provides backward compatibility

    3. UPDATE THE __init__ METHOD:
       Add: self.signal_mapper = SignalIDMapper()

    4. MODIFY EXISTING METHODS:
       - Keep existing create_rinex_netcdf_with_oft() for OFT compatibility
       - Add sid option to your main interface methods
       - Update data handlers to support sid dimension

    5. UPDATE DATA_HANDLER/RNX_PARSER.PY:
       - Modify concatenate_datasets() to handle sid dimension
       - Add sid detection alongside OFT detection
       - Update encoding to handle sid string coordinates

    6. UPDATE PROCESSOR/PROCESSOR.PY:
       - Add sid support to create_common_space_datatree()
       - Handle both OFT and sid structures in alignment logic

    BENEFITS OF THIS STRUCTURE:
    ===========================

    ✓ Solves Problem A: Bandwidth overlap handling
      - Overlapping signals kept separate with metadata for filtering
      - band properties include bandwidth information

    ✓ Solves Problem B: code-specific performance differences
      - Each sv|band|code combination gets unique sid
      - No more priority-based LUT - all combinations preserved

    ✓ Maintains compatibility:
      - Legacy conversion available
      - OFT structure still supported
      - Existing code continues to work

    ✓ Enhanced filtering capabilities:
      - Filter by system, band, code independently
      - Complex filtering with multiple criteria
      - Overlap group filtering for analysis

    MIGRATION PATH:
    ===============

    Phase 1: Add sid methods alongside existing OFT methods
    Phase 2: Update data handlers to support both structures
    Phase 3: Gradually migrate analysis code to use sid
    Phase 4: Deprecate old frequency-mapping approach (optional)

    EXAMPLE USAGE AFTER INTEGRATION:
    =================================

    # Create datasets with different structures
    ds_oft = rnx.create_rinex_netcdf_with_oft()           # Current OFT structure
    ds_signal = rnx.create_rinex_netcdf_with_signal_id()  # New sid structure
    ds_legacy = rnx.create_rinex_netcdf(mapped_epochs)    # Legacy structure

    # Advanced sid usage
    ds_enhanced = rnx.to_ds(
        keep_data_vars=["SNR", "Phase"],
        apply_overlap_filter=True,
        overlap_preferences={'L1_E1_B1I': 'L1'}  # Prefer GPS L1 over Galileo E1
    )
    """

Base Reader¶

Abstract base class for GNSS data readers.

Defines interface that all readers (RINEX v3, RINEX v2, SBF, future formats) must implement to ensure compatibility with downstream pipeline: - VOD calculation (canvod-vod) - Storage (canvod-store / MyIcechunkStore) - Grid operations (canvod-grids)

Contract constants (REQUIRED_DIMS, REQUIRED_COORDS, etc.) are the single source of truth for the output Dataset structure. Use :func:validate_dataset to check any Dataset against them.

`DatasetStructureValidator` ¶

Bases: BaseModel

Validates that an xarray.Dataset meets the GNSSDataReader contract.

Wraps a Dataset and checks it against the contract constants above. Use this in tests and reader implementations to catch structural errors early with clear messages.

Examples¶

validator = DatasetStructureValidator(dataset=ds) validator.validate_all() # raises ValueError on any violation validator.validate_dimensions() # check just one aspect

Source code in packages/canvod-readers/src/canvod/readers/base.py

class DatasetStructureValidator(BaseModel):
    """Validates that an xarray.Dataset meets the GNSSDataReader contract.

    Wraps a Dataset and checks it against the contract constants above.
    Use this in tests and reader implementations to catch structural errors
    early with clear messages.

    Examples
    --------
    >>> validator = DatasetStructureValidator(dataset=ds)
    >>> validator.validate_all()          # raises ValueError on any violation
    >>> validator.validate_dimensions()   # check just one aspect
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    dataset: xr.Dataset

    def validate_all(self, required_vars: list[str] | None = None) -> None:
        """Run all validations, collecting **all** errors.

        Delegates to :func:`validate_dataset` so the logic lives in one place.
        """
        validate_dataset(self.dataset, required_vars=required_vars)

    def validate_dimensions(self) -> None:
        """Check that required dimensions (epoch, sid) exist."""
        missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
        if missing:
            raise ValueError(f"Missing required dimensions: {missing}")

    def validate_coordinates(self) -> None:
        """Check that required coordinates exist with correct dtypes."""
        for coord, expected_dtype in REQUIRED_COORDS.items():
            if coord not in self.dataset.coords:
                raise ValueError(f"Missing required coordinate: {coord}")
            actual = str(self.dataset[coord].dtype)
            if expected_dtype == "object":
                is_valid_string = actual == "object" or actual.startswith("StringDType")
                if not is_valid_string:
                    raise ValueError(
                        f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                    )
            elif expected_dtype not in actual:
                raise ValueError(
                    f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
                )

    def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
        """Check that required data variables exist with correct dims."""
        if required_vars is None:
            required_vars = list(DEFAULT_REQUIRED_VARS)
        missing = set(required_vars) - set(self.dataset.data_vars)
        if missing:
            raise ValueError(f"Missing required data variables: {missing}")
        for var in self.dataset.data_vars:
            if self.dataset[var].dims != REQUIRED_DIMS:
                raise ValueError(
                    f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                    f"got {self.dataset[var].dims}"
                )

    def validate_attributes(self) -> None:
        """Check that required global attributes are present."""
        missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
        if missing:
            raise ValueError(f"Missing required attributes: {missing}")

`validate_all(required_vars=None)` ¶

Run all validations, collecting all errors.

Delegates to :func:validate_dataset so the logic lives in one place.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_all(self, required_vars: list[str] | None = None) -> None:
    """Run all validations, collecting **all** errors.

    Delegates to :func:`validate_dataset` so the logic lives in one place.
    """
    validate_dataset(self.dataset, required_vars=required_vars)

`validate_dimensions()` ¶

Check that required dimensions (epoch, sid) exist.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_dimensions(self) -> None:
    """Check that required dimensions (epoch, sid) exist."""
    missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
    if missing:
        raise ValueError(f"Missing required dimensions: {missing}")

`validate_coordinates()` ¶

Check that required coordinates exist with correct dtypes.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_coordinates(self) -> None:
    """Check that required coordinates exist with correct dtypes."""
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in self.dataset.coords:
            raise ValueError(f"Missing required coordinate: {coord}")
        actual = str(self.dataset[coord].dtype)
        if expected_dtype == "object":
            is_valid_string = actual == "object" or actual.startswith("StringDType")
            if not is_valid_string:
                raise ValueError(
                    f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                )
        elif expected_dtype not in actual:
            raise ValueError(
                f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
            )

`validate_data_variables(required_vars=None)` ¶

Check that required data variables exist with correct dims.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
    """Check that required data variables exist with correct dims."""
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)
    missing = set(required_vars) - set(self.dataset.data_vars)
    if missing:
        raise ValueError(f"Missing required data variables: {missing}")
    for var in self.dataset.data_vars:
        if self.dataset[var].dims != REQUIRED_DIMS:
            raise ValueError(
                f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                f"got {self.dataset[var].dims}"
            )

`validate_attributes()` ¶

Check that required global attributes are present.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_attributes(self) -> None:
    """Check that required global attributes are present."""
    missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
    if missing:
        raise ValueError(f"Missing required attributes: {missing}")

`SignalID` ¶

Bases: BaseModel

Validated signal identifier (SV + band + code).

sid = SignalID(sv="G01", band="L1", code="C") str(sid) 'G01|L1|C' sid.system 'G'

Source code in packages/canvod-readers/src/canvod/readers/base.py

class SignalID(BaseModel):
    """Validated signal identifier (SV + band + code).

    >>> sid = SignalID(sv="G01", band="L1", code="C")
    >>> str(sid)
    'G01|L1|C'
    >>> sid.system
    'G'
    """

    model_config = ConfigDict(frozen=True)

    sv: str
    band: str
    code: str

    @field_validator("sv")
    @classmethod
    def _validate_sv(cls, v: str) -> str:
        if not SV_PATTERN.match(v):
            raise ValueError(
                f"Invalid SV: {v!r} — expected system letter + 2-digit PRN "
                f"(e.g. 'G01'). Valid systems: G, R, E, C, J, S, I"
            )
        return v

    @property
    def system(self) -> str:
        """GNSS system letter (e.g. 'G' for GPS)."""
        return self.sv[0]

    @property
    def sid(self) -> str:
        """Full signal ID string ('SV|band|code')."""
        return f"{self.sv}|{self.band}|{self.code}"

    def __str__(self) -> str:
        return self.sid

    def __hash__(self) -> int:
        return hash(self.sid)

    def __eq__(self, other: object) -> bool:
        if isinstance(other, SignalID):
            return self.sid == other.sid
        return NotImplemented

    @classmethod
    def from_string(cls, sid_str: str) -> SignalID:
        """Parse a signal ID string ('SV|band|code') into a SignalID.

        Parameters
        ----------
        sid_str : str
            Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

        Returns
        -------
        SignalID
            Validated signal identifier.

        Raises
        ------
        ValueError
            If the string does not have exactly three pipe-separated parts.
        """
        parts = sid_str.split("|")
        if len(parts) != 3:
            raise ValueError(
                f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
            )
        return cls(sv=parts[0], band=parts[1], code=parts[2])

`system` `property` ¶

GNSS system letter (e.g. 'G' for GPS).

`sid` `property` ¶

Full signal ID string ('SV|band|code').

`from_string(sid_str)` `classmethod` ¶

Parse a signal ID string ('SV|band|code') into a SignalID.

Parameters¶

sid_str : str Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

Returns¶

SignalID Validated signal identifier.

Raises¶

ValueError If the string does not have exactly three pipe-separated parts.

Source code in packages/canvod-readers/src/canvod/readers/base.py

@classmethod
def from_string(cls, sid_str: str) -> SignalID:
    """Parse a signal ID string ('SV|band|code') into a SignalID.

    Parameters
    ----------
    sid_str : str
        Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

    Returns
    -------
    SignalID
        Validated signal identifier.

    Raises
    ------
    ValueError
        If the string does not have exactly three pipe-separated parts.
    """
    parts = sid_str.split("|")
    if len(parts) != 3:
        raise ValueError(
            f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
        )
    return cls(sv=parts[0], band=parts[1], code=parts[2])

`GNSSDataReader` ¶

Bases: BaseModel, ABC

Abstract base class for all GNSS data format readers.

All readers must: 1. Inherit from this class 2. Implement all abstract methods 3. Return xarray.Dataset that passes :func:validate_dataset 4. Provide file hash for deduplication

This ensures compatibility with: - canvod-vod: VOD calculation - canvod-store: MyIcechunkStore storage - canvod-grids: Grid projection operations

Subclasses may override model_config to set frozen, extra, etc. The base class provides arbitrary_types_allowed=True which is needed by readers that use pint.Quantity or similar third-party types.

Examples¶

class Rnxv3Obs(GNSSDataReader): ... def to_ds(self, **kwargs) -> xr.Dataset: ... # Implementation ... return dataset ... reader = Rnxv3Obs(fpath="station.24o") ds = reader.to_ds() validate_dataset(ds)

Source code in packages/canvod-readers/src/canvod/readers/base.py

class GNSSDataReader(BaseModel, ABC):
    """Abstract base class for all GNSS data format readers.

    All readers must:
    1. Inherit from this class
    2. Implement all abstract methods
    3. Return xarray.Dataset that passes :func:`validate_dataset`
    4. Provide file hash for deduplication

    This ensures compatibility with:
    - canvod-vod: VOD calculation
    - canvod-store: MyIcechunkStore storage
    - canvod-grids: Grid projection operations

    Subclasses may override ``model_config`` to set ``frozen``, ``extra``,
    etc.  The base class provides ``arbitrary_types_allowed=True`` which is
    needed by readers that use ``pint.Quantity`` or similar third-party types.

    Examples
    --------
    >>> class Rnxv3Obs(GNSSDataReader):
    ...     def to_ds(self, **kwargs) -> xr.Dataset:
    ...         # Implementation
    ...         return dataset
    ...
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> ds = reader.to_ds()
    >>> validate_dataset(ds)
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    fpath: Path

    @field_validator("fpath")
    @classmethod
    def _validate_fpath(cls, v: Path) -> Path:
        """Validate that the file path points to an existing file."""
        v = Path(v)
        if not v.is_file():
            raise FileNotFoundError(f"File not found: {v}")
        return v

    @property
    def source_format(self) -> str:
        """Return the format identifier for this reader (e.g. ``"rinex3"``, ``"sbf"``)."""
        return "rinex3"

    @property
    @abstractmethod
    def file_hash(self) -> str:
        """Return SHA256 hash of file for deduplication.

        Used by MyIcechunkStore to avoid duplicate ingestion.
        Must be deterministic and reproducible.

        Returns
        -------
        str
            Short hash (16 chars) or full hash of file content
        """

    @abstractmethod
    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert data to xarray.Dataset.

        Must return Dataset with structure:
        - Dims: (epoch, sid)
        - Coords: epoch, sid, sv, system, band, code, freq_*
        - Data vars: At minimum SNR
        - Attrs: Must include "File Hash"

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to include. If None, includes all available.
        **kwargs
            Implementation-specific parameters

        Returns
        -------
        xr.Dataset
            Dataset that passes :func:`validate_dataset`.
        """

    @abstractmethod
    def iter_epochs(self) -> Iterator[object]:
        """Iterate over epochs in the file.

        Yields
        ------
        Epoch
            Parsed epoch with satellites and observations.
        """

    def to_ds_and_auxiliary(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
        """Produce the obs dataset and any auxiliary datasets in a single call.

        Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
        Readers that produce metadata (e.g. SBF) override this to collect both
        in a single file scan.

        Returns
        -------
        tuple[xr.Dataset, dict[str, xr.Dataset]]
            ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
            readers with no extra data (RINEX v2/v3).
        """
        return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

    def _build_attrs(self) -> dict[str, str]:
        """Build standard global attributes for the output Dataset.

        Reads institution/author from config, adds timestamp, version,
        and the file hash.

        Returns
        -------
        dict[str, str]
            Ready-to-use attrs dict.
        """
        from canvod.readers.gnss_specs.metadata import get_global_attrs
        from canvod.readers.gnss_specs.utils import get_version_from_pyproject

        attrs = get_global_attrs()
        attrs["Created"] = datetime.now(UTC).isoformat()
        attrs["Software"] = (
            f"{attrs['Software']}, Version: {get_version_from_pyproject()}"
        )
        attrs["File Hash"] = self.file_hash
        return attrs

    @property
    @abstractmethod
    def start_time(self) -> datetime:
        """Return start time of observations.

        Returns
        -------
        datetime
            First observation timestamp in the file.
        """

    @property
    @abstractmethod
    def end_time(self) -> datetime:
        """Return end time of observations.

        Returns
        -------
        datetime
            Last observation timestamp in the file.
        """

    @property
    @abstractmethod
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'
        """

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Default implementation iterates epochs.  Subclasses may override
        with a faster approach.

        Returns
        -------
        int
            Total number of observation epochs.
        """
        return sum(1 for _ in self.iter_epochs())

    @property
    @abstractmethod
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.
        """

    def __repr__(self) -> str:
        """Return the string representation."""
        return f"{self.__class__.__name__}(file='{self.fpath.name}')"

`source_format` `property` ¶

Return the format identifier for this reader (e.g. "rinex3", "sbf").

`file_hash` `abstractmethod` `property` ¶

Return SHA256 hash of file for deduplication.

Used by MyIcechunkStore to avoid duplicate ingestion. Must be deterministic and reproducible.

Returns¶

str Short hash (16 chars) or full hash of file content

`start_time` `abstractmethod` `property` ¶

Return start time of observations.

Returns¶

datetime First observation timestamp in the file.

`end_time` `abstractmethod` `property` ¶

Return end time of observations.

Returns¶

datetime Last observation timestamp in the file.

`systems` `abstractmethod` `property` ¶

Return list of GNSS systems in file.

Returns¶

list of str System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'

`num_epochs` `property` ¶

Return number of epochs in file.

Default implementation iterates epochs. Subclasses may override with a faster approach.

Returns¶

int Total number of observation epochs.

`num_satellites` `abstractmethod` `property` ¶

Return total number of unique satellites observed.

Returns¶

int Count of unique satellite vehicles across all systems.

`to_ds(keep_data_vars=None, **kwargs)` `abstractmethod` ¶

Convert data to xarray.Dataset.

Must return Dataset with structure: - Dims: (epoch, sid) - Coords: epoch, sid, sv, system, band, code, freq_* - Data vars: At minimum SNR - Attrs: Must include "File Hash"

Parameters¶

keep_data_vars : list of str, optional Data variables to include. If None, includes all available. **kwargs Implementation-specific parameters

Returns¶

xr.Dataset Dataset that passes :func:validate_dataset.

Source code in packages/canvod-readers/src/canvod/readers/base.py

@abstractmethod
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert data to xarray.Dataset.

    Must return Dataset with structure:
    - Dims: (epoch, sid)
    - Coords: epoch, sid, sv, system, band, code, freq_*
    - Data vars: At minimum SNR
    - Attrs: Must include "File Hash"

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to include. If None, includes all available.
    **kwargs
        Implementation-specific parameters

    Returns
    -------
    xr.Dataset
        Dataset that passes :func:`validate_dataset`.
    """

`iter_epochs()` `abstractmethod` ¶

Iterate over epochs in the file.

Yields¶

Epoch Parsed epoch with satellites and observations.

Source code in packages/canvod-readers/src/canvod/readers/base.py

@abstractmethod
def iter_epochs(self) -> Iterator[object]:
    """Iterate over epochs in the file.

    Yields
    ------
    Epoch
        Parsed epoch with satellites and observations.
    """

`to_ds_and_auxiliary(keep_data_vars=None, **kwargs)` ¶

Produce the obs dataset and any auxiliary datasets in a single call.

Default: calls to_ds(**kwargs) and returns an empty auxiliary dict. Readers that produce metadata (e.g. SBF) override this to collect both in a single file scan.

Returns¶

tuple[xr.Dataset, dict[str, xr.Dataset]] (obs_ds, {"name": aux_ds, ...}). Auxiliary dict is empty for readers with no extra data (RINEX v2/v3).

Source code in packages/canvod-readers/src/canvod/readers/base.py

def to_ds_and_auxiliary(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
    """Produce the obs dataset and any auxiliary datasets in a single call.

    Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
    Readers that produce metadata (e.g. SBF) override this to collect both
    in a single file scan.

    Returns
    -------
    tuple[xr.Dataset, dict[str, xr.Dataset]]
        ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
        readers with no extra data (RINEX v2/v3).
    """
    return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

`repr()` ¶

Return the string representation.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def __repr__(self) -> str:
    """Return the string representation."""
    return f"{self.__class__.__name__}(file='{self.fpath.name}')"

`validate_dataset(ds, required_vars=None)` ¶

Validate ds meets the GNSSDataReader output contract.

Collects all violations and raises a single ValueError listing every problem, rather than stopping at the first failure.

Parameters¶

ds : xr.Dataset Dataset to validate. required_vars : list of str, optional Data variables that must be present. Defaults to :data:DEFAULT_REQUIRED_VARS (["SNR"]).

Raises¶

ValueError If any contract violation is found.

Source code in packages/canvod-readers/src/canvod/readers/base.py

def validate_dataset(ds: xr.Dataset, required_vars: list[str] | None = None) -> None:
    """Validate *ds* meets the GNSSDataReader output contract.

    Collects **all** violations and raises a single ``ValueError`` listing
    every problem, rather than stopping at the first failure.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset to validate.
    required_vars : list of str, optional
        Data variables that must be present.  Defaults to
        :data:`DEFAULT_REQUIRED_VARS` (``["SNR"]``).

    Raises
    ------
    ValueError
        If any contract violation is found.
    """
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)

    errors: list[str] = []

    # -- dimensions --
    missing_dims = set(REQUIRED_DIMS) - set(ds.dims)
    if missing_dims:
        errors.append(f"Missing required dimensions: {missing_dims}")

    # -- coordinates --
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in ds.coords:
            errors.append(f"Missing required coordinate: {coord}")
            continue

        actual_dtype = str(ds[coord].dtype)
        if expected_dtype == "object":
            # Accept object (VariableLengthUTF8, stable Zarr V3) and numpy 2.0
            # StringDType (same stable type, different numpy representation).
            # Reject <U* (FixedLengthUTF32) — no stable Zarr V3 spec.
            is_valid_string = actual_dtype == "object" or actual_dtype.startswith(
                "StringDType"
            )
            if not is_valid_string:
                errors.append(
                    f"Coordinate {coord} has wrong dtype: "
                    f"expected string (object/StringDType), got {actual_dtype}"
                )
        elif expected_dtype not in actual_dtype:
            errors.append(
                f"Coordinate {coord} has wrong dtype: "
                f"expected {expected_dtype}, got {actual_dtype}"
            )

    # -- data variables --
    missing_vars = set(required_vars) - set(ds.data_vars)
    if missing_vars:
        errors.append(f"Missing required data variables: {missing_vars}")

    expected_var_dims = ("epoch", "sid")
    for var in ds.data_vars:
        if ds[var].dims != expected_var_dims:
            errors.append(
                f"Data variable {var} has wrong dimensions: "
                f"expected {expected_var_dims}, got {ds[var].dims}"
            )

    # -- attributes --
    missing_attrs = REQUIRED_ATTRS - set(ds.attrs.keys())
    if missing_attrs:
        errors.append(f"Missing required attributes: {missing_attrs}")

    if errors:
        raise ValueError(
            "Dataset validation failed:\n" + "\n".join(f"  - {e}" for e in errors)
        )

Dataset Builder¶

Guided builder for constructing valid GNSSDataReader output Datasets.

Handles coordinate arrays, dtype enforcement, frequency resolution, and contract validation automatically.

Examples¶

builder = DatasetBuilder(reader) for epoch in reader.iter_epochs(): ... ei = builder.add_epoch(epoch.timestamp) ... for obs in epoch.observations: ... sig = builder.add_signal(sv="G01", band="L1", code="C") ... builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset

`DatasetBuilder` ¶

Guided builder for constructing valid GNSSDataReader output Datasets.

Handles coordinate arrays, dtype enforcement, frequency resolution, and contract validation automatically.

Parameters¶

reader : GNSSDataReader The reader instance (used for _build_attrs() and file hash). aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels (default True).

Examples¶

builder = DatasetBuilder(reader) for epoch in reader.iter_epochs(): ... ei = builder.add_epoch(epoch.timestamp) ... for obs in epoch.observations: ... sig = builder.add_signal(sv="G01", band="L1", code="C") ... builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset

Source code in packages/canvod-readers/src/canvod/readers/builder.py

class DatasetBuilder:
    """Guided builder for constructing valid GNSSDataReader output Datasets.

    Handles coordinate arrays, dtype enforcement, frequency resolution,
    and contract validation automatically.

    Parameters
    ----------
    reader : GNSSDataReader
        The reader instance (used for ``_build_attrs()`` and file hash).
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels (default True).

    Examples
    --------
    >>> builder = DatasetBuilder(reader)
    >>> for epoch in reader.iter_epochs():
    ...     ei = builder.add_epoch(epoch.timestamp)
    ...     for obs in epoch.observations:
    ...         sig = builder.add_signal(sv="G01", band="L1", code="C")
    ...         builder.set_value(ei, sig, "SNR", 42.0)
    >>> ds = builder.build()   # validated Dataset
    """

    def __init__(
        self,
        reader: GNSSDataReader,
        *,
        aggregate_glonass_fdma: bool = True,
    ) -> None:
        self._reader = reader
        self._mapper = SignalIDMapper(aggregate_glonass_fdma=aggregate_glonass_fdma)
        self._signals: dict[str, SignalID] = {}
        self._epochs: list[datetime] = []
        self._values: dict[str, dict[tuple[int, str], float]] = {}

    def add_epoch(self, timestamp: datetime) -> int:
        """Register an epoch timestamp. Returns epoch index."""
        self._epochs.append(timestamp)
        return len(self._epochs) - 1

    def add_signal(self, sv: str, band: str, code: str) -> SignalID:
        """Register a signal (idempotent). Returns validated SignalID."""
        sig = SignalID(sv=sv, band=band, code=code)
        self._signals[sig.sid] = sig
        return sig

    def set_value(
        self,
        epoch_idx: int,
        signal: SignalID | str,
        var: str,
        value: float,
    ) -> None:
        """Set a data value for a given epoch, signal, and variable.

        Parameters
        ----------
        epoch_idx : int
            Index returned by :meth:`add_epoch`.
        signal : SignalID or str
            Signal identifier (SignalID or 'SV|band|code' string).
        var : str
            Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
        value : float
            The observation value.
        """
        sid = str(signal)
        if var not in self._values:
            self._values[var] = {}
        self._values[var][(epoch_idx, sid)] = value

    def build(
        self,
        keep_data_vars: list[str] | None = None,
        extra_attrs: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Build, validate, and return the Dataset.

        1. Sorts signals alphabetically
        2. Resolves frequencies from band names via SignalIDMapper
        3. Constructs coordinate arrays with correct dtypes (float32 for freq)
        4. Attaches CF-compliant metadata from COORDS_METADATA
        5. Calls validate_dataset() before returning

        Parameters
        ----------
        keep_data_vars : list of str, optional
            If provided, only include these data variables.  If ``None``,
            includes all variables that had values set.
        extra_attrs : dict, optional
            Additional global attributes to merge into the Dataset.

        Returns
        -------
        xr.Dataset
            Validated Dataset with dimensions ``(epoch, sid)``.
        """
        sorted_sids = sorted(self._signals)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(self._epochs)
        n_sids = len(sorted_sids)

        # --- Coordinate arrays ---
        epoch_arr = [
            np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
            for ts in self._epochs
        ]
        sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
        system_arr = np.array(
            [self._signals[s].system for s in sorted_sids], dtype=object
        )
        band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
        code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

        # Frequency resolution via SignalIDMapper
        freq_center = np.array(
            [
                self._mapper.get_band_frequency(self._signals[s].band) or np.nan
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        bandwidths = np.array(
            [
                self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        freq_min = (freq_center - bandwidths / 2).astype(np.float32)
        freq_max = (freq_center + bandwidths / 2).astype(np.float32)

        # --- Determine which variables to include ---
        all_vars = set(self._values.keys())
        if keep_data_vars is not None:
            vars_to_build = [v for v in keep_data_vars if v in all_vars]
        else:
            vars_to_build = sorted(all_vars)

        # --- Data variable arrays ---
        data_vars: dict[str, tuple] = {}
        for var in vars_to_build:
            dtype = DTYPES.get(var, np.dtype("float32"))
            fill = np.nan if np.issubdtype(dtype, np.floating) else -1
            arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

            for (ei, sid_str), val in self._values[var].items():
                if sid_str in sid_to_idx:
                    arr[ei, sid_to_idx[sid_str]] = val

            meta = _VAR_METADATA.get(var, {})
            data_vars[var] = (("epoch", "sid"), arr, meta)

        # --- Coordinates ---
        coords = {
            "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
            "system": ("sid", system_arr, COORDS_METADATA["system"]),
            "band": ("sid", band_arr, COORDS_METADATA["band"]),
            "code": ("sid", code_arr, COORDS_METADATA["code"]),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        # --- Global attributes ---
        attrs = self._reader._build_attrs()
        if extra_attrs:
            attrs.update(extra_attrs)

        ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

        # Validate before returning
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

`add_epoch(timestamp)` ¶

Register an epoch timestamp. Returns epoch index.

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def add_epoch(self, timestamp: datetime) -> int:
    """Register an epoch timestamp. Returns epoch index."""
    self._epochs.append(timestamp)
    return len(self._epochs) - 1

`add_signal(sv, band, code)` ¶

Register a signal (idempotent). Returns validated SignalID.

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def add_signal(self, sv: str, band: str, code: str) -> SignalID:
    """Register a signal (idempotent). Returns validated SignalID."""
    sig = SignalID(sv=sv, band=band, code=code)
    self._signals[sig.sid] = sig
    return sig

`set_value(epoch_idx, signal, var, value)` ¶

Set a data value for a given epoch, signal, and variable.

Parameters¶

epoch_idx : int Index returned by :meth:add_epoch. signal : SignalID or str Signal identifier (SignalID or 'SV|band|code' string). var : str Variable name (e.g. 'SNR', 'Pseudorange', 'Phase'). value : float The observation value.

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def set_value(
    self,
    epoch_idx: int,
    signal: SignalID | str,
    var: str,
    value: float,
) -> None:
    """Set a data value for a given epoch, signal, and variable.

    Parameters
    ----------
    epoch_idx : int
        Index returned by :meth:`add_epoch`.
    signal : SignalID or str
        Signal identifier (SignalID or 'SV|band|code' string).
    var : str
        Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
    value : float
        The observation value.
    """
    sid = str(signal)
    if var not in self._values:
        self._values[var] = {}
    self._values[var][(epoch_idx, sid)] = value

`build(keep_data_vars=None, extra_attrs=None)` ¶

Build, validate, and return the Dataset.

Sorts signals alphabetically
Resolves frequencies from band names via SignalIDMapper
Constructs coordinate arrays with correct dtypes (float32 for freq)
Attaches CF-compliant metadata from COORDS_METADATA
Calls validate_dataset() before returning

Parameters¶

keep_data_vars : list of str, optional If provided, only include these data variables. If None, includes all variables that had values set. extra_attrs : dict, optional Additional global attributes to merge into the Dataset.

Returns¶

xr.Dataset Validated Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/builder.py

def build(
    self,
    keep_data_vars: list[str] | None = None,
    extra_attrs: dict[str, str] | None = None,
) -> xr.Dataset:
    """Build, validate, and return the Dataset.

    1. Sorts signals alphabetically
    2. Resolves frequencies from band names via SignalIDMapper
    3. Constructs coordinate arrays with correct dtypes (float32 for freq)
    4. Attaches CF-compliant metadata from COORDS_METADATA
    5. Calls validate_dataset() before returning

    Parameters
    ----------
    keep_data_vars : list of str, optional
        If provided, only include these data variables.  If ``None``,
        includes all variables that had values set.
    extra_attrs : dict, optional
        Additional global attributes to merge into the Dataset.

    Returns
    -------
    xr.Dataset
        Validated Dataset with dimensions ``(epoch, sid)``.
    """
    sorted_sids = sorted(self._signals)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(self._epochs)
    n_sids = len(sorted_sids)

    # --- Coordinate arrays ---
    epoch_arr = [
        np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
        for ts in self._epochs
    ]
    sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
    system_arr = np.array(
        [self._signals[s].system for s in sorted_sids], dtype=object
    )
    band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
    code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

    # Frequency resolution via SignalIDMapper
    freq_center = np.array(
        [
            self._mapper.get_band_frequency(self._signals[s].band) or np.nan
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    bandwidths = np.array(
        [
            self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    freq_min = (freq_center - bandwidths / 2).astype(np.float32)
    freq_max = (freq_center + bandwidths / 2).astype(np.float32)

    # --- Determine which variables to include ---
    all_vars = set(self._values.keys())
    if keep_data_vars is not None:
        vars_to_build = [v for v in keep_data_vars if v in all_vars]
    else:
        vars_to_build = sorted(all_vars)

    # --- Data variable arrays ---
    data_vars: dict[str, tuple] = {}
    for var in vars_to_build:
        dtype = DTYPES.get(var, np.dtype("float32"))
        fill = np.nan if np.issubdtype(dtype, np.floating) else -1
        arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

        for (ei, sid_str), val in self._values[var].items():
            if sid_str in sid_to_idx:
                arr[ei, sid_to_idx[sid_str]] = val

        meta = _VAR_METADATA.get(var, {})
        data_vars[var] = (("epoch", "sid"), arr, meta)

    # --- Coordinates ---
    coords = {
        "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
        "system": ("sid", system_arr, COORDS_METADATA["system"]),
        "band": ("sid", band_arr, COORDS_METADATA["band"]),
        "code": ("sid", code_arr, COORDS_METADATA["code"]),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    # --- Global attributes ---
    attrs = self._reader._build_attrs()
    if extra_attrs:
        attrs.update(extra_attrs)

    ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

    # Validate before returning
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

GNSS Specifications¶

GNSS specifications and core characteristics.

This module contains fundamental GNSS definitions including: - Constants: Unit registry, physical constants, RINEX parameters - Exceptions: GNSS-specific error types - Metadata: CF-compliant metadata for coordinates and observables - Models: Pydantic validation models for RINEX data structures - Signals: Signal ID mapping and band properties - Utils: File hashing, version extraction, data type checks

These components are used across all GNSS reader implementations.

Directory Matching¶

Directory matching for RINEX data files.

Identifies and matches RINEX data directories across dates and receivers.

`DataDirMatcher` ¶

Match RINEX data directories for canopy and reference receivers.

Scans a root directory structure to find dates with RINEX files present in both canopy and reference receiver directories.

Parameters¶

root : Path Root directory containing receiver subdirectories reference_pattern : Path, optional Relative path pattern for reference receiver (default: "01_reference/01_GNSS/01_raw") canopy_pattern : Path, optional Relative path pattern for canopy receiver (default: "02_canopy/01_GNSS/01_raw")

Examples¶

from pathlib import Path matcher = DataDirMatcher( ... root=Path("/data/01_Rosalia"), ... reference_pattern=Path("01_reference/01_GNSS/01_raw"), ... canopy_pattern=Path("02_canopy/01_GNSS/01_raw") ... )

Iterate over matched directories¶

for matched_dirs in matcher: ... print(matched_dirs.yyyydoy) ... rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o")) ... print(f" Found {len(rinex_files)} RINEX files")

Get list of common dates¶

dates = matcher.get_common_dates() print(f"Found {len(dates)} dates with data")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

class DataDirMatcher:
    """Match RINEX data directories for canopy and reference receivers.

    Scans a root directory structure to find dates with RINEX files
    present in both canopy and reference receiver directories.

    Parameters
    ----------
    root : Path
        Root directory containing receiver subdirectories
    reference_pattern : Path, optional
        Relative path pattern for reference receiver
        (default: "01_reference/01_GNSS/01_raw")
    canopy_pattern : Path, optional
        Relative path pattern for canopy receiver
        (default: "02_canopy/01_GNSS/01_raw")

    Examples
    --------
    >>> from pathlib import Path
    >>> matcher = DataDirMatcher(
    ...     root=Path("/data/01_Rosalia"),
    ...     reference_pattern=Path("01_reference/01_GNSS/01_raw"),
    ...     canopy_pattern=Path("02_canopy/01_GNSS/01_raw")
    ... )
    >>>
    >>> # Iterate over matched directories
    >>> for matched_dirs in matcher:
    ...     print(matched_dirs.yyyydoy)
    ...     rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o"))
    ...     print(f"  Found {len(rinex_files)} RINEX files")

    >>> # Get list of common dates
    >>> dates = matcher.get_common_dates()
    >>> print(f"Found {len(dates)} dates with data")

    """

    def __init__(
        self,
        root: Path,
        reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
        canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
    ) -> None:
        """Initialize matcher with directory structure."""
        import warnings

        warnings.warn(
            "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.root = Path(root)
        self.reference_dir = self.root / reference_pattern
        self.canopy_dir = self.root / canopy_pattern

        # Validate directories exist
        self._validate_directory(self.root, "Root")
        self._validate_directory(self.reference_dir, "Reference")
        self._validate_directory(self.canopy_dir, "Canopy")

    def __iter__(self) -> Iterator[MatchedDirs]:
        """Iterate over matched directory pairs with RINEX files.

        Yields
        ------
        MatchedDirs
            Matched directories for each date.

        """
        for date_str in self.get_common_dates():
            yield MatchedDirs(
                canopy_data_dir=self.canopy_dir / date_str,
                reference_data_dir=self.reference_dir / date_str,
                yyyydoy=YYYYDOY.from_yydoy_str(date_str),
            )

    def get_common_dates(self) -> list[str]:
        """Get dates with RINEX files in both receivers.

        Uses parallel processing to check directories efficiently.

        Returns
        -------
        list[str]
            Sorted list of date strings (YYDDD format, e.g., "25001")
            that have RINEX files in both canopy and reference directories.

        """
        # Find dates with RINEX in each receiver
        ref_dates = self._get_dates_with_rinex(self.reference_dir)
        can_dates = self._get_dates_with_rinex(self.canopy_dir)

        # Find intersection
        common = ref_dates & can_dates
        common.discard("00000")  # Remove placeholder directories

        # Sort naturally (numerical order)
        return natsorted(common)

    def _get_dates_with_rinex(self, base_dir: Path) -> set[str]:
        """Find all date directories containing RINEX files.

        Uses parallel processing to check multiple directories at once.

        Parameters
        ----------
        base_dir : Path
            Base directory to search (e.g., canopy or reference root).

        Returns
        -------
        set[str]
            Set of date directory names that contain RINEX files.

        """
        # Get all subdirectories
        date_dirs = (d for d in base_dir.iterdir() if d.is_dir())

        # Check for RINEX files in parallel
        dates_with_rinex = set()

        with ThreadPoolExecutor() as executor:
            future_to_dir = {
                executor.submit(self._has_rinex_files, d): d for d in date_dirs
            }

            for future in as_completed(future_to_dir):
                directory = future_to_dir[future]
                if future.result():
                    dates_with_rinex.add(directory.name)

        return dates_with_rinex

    @staticmethod
    def _has_rinex_files(directory: Path) -> bool:
        """Check if directory contains RINEX observation files.

        Parameters
        ----------
        directory : Path
            Directory to check.

        Returns
        -------
        bool
            True if RINEX files found.

        """
        return _has_rinex_files(directory)

    def _validate_directory(self, path: Path, name: str) -> None:
        """Validate directory exists.

        Parameters
        ----------
        path : Path
            Directory to check.
        name : str
            Name for error message.

        Raises
        ------
        FileNotFoundError
            If directory doesn't exist.

        """
        if not path.exists():
            msg = f"{name} directory not found: {path}"
            raise FileNotFoundError(msg)

`init(root, reference_pattern=Path('01_reference/01_GNSS/01_raw'), canopy_pattern=Path('02_canopy/01_GNSS/01_raw'))` ¶

Initialize matcher with directory structure.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __init__(
    self,
    root: Path,
    reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
    canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
) -> None:
    """Initialize matcher with directory structure."""
    import warnings

    warnings.warn(
        "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.root = Path(root)
    self.reference_dir = self.root / reference_pattern
    self.canopy_dir = self.root / canopy_pattern

    # Validate directories exist
    self._validate_directory(self.root, "Root")
    self._validate_directory(self.reference_dir, "Reference")
    self._validate_directory(self.canopy_dir, "Canopy")

`iter()` ¶

Iterate over matched directory pairs with RINEX files.

Yields¶

MatchedDirs Matched directories for each date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __iter__(self) -> Iterator[MatchedDirs]:
    """Iterate over matched directory pairs with RINEX files.

    Yields
    ------
    MatchedDirs
        Matched directories for each date.

    """
    for date_str in self.get_common_dates():
        yield MatchedDirs(
            canopy_data_dir=self.canopy_dir / date_str,
            reference_data_dir=self.reference_dir / date_str,
            yyyydoy=YYYYDOY.from_yydoy_str(date_str),
        )

`get_common_dates()` ¶

Get dates with RINEX files in both receivers.

Uses parallel processing to check directories efficiently.

Returns¶

list[str] Sorted list of date strings (YYDDD format, e.g., "25001") that have RINEX files in both canopy and reference directories.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def get_common_dates(self) -> list[str]:
    """Get dates with RINEX files in both receivers.

    Uses parallel processing to check directories efficiently.

    Returns
    -------
    list[str]
        Sorted list of date strings (YYDDD format, e.g., "25001")
        that have RINEX files in both canopy and reference directories.

    """
    # Find dates with RINEX in each receiver
    ref_dates = self._get_dates_with_rinex(self.reference_dir)
    can_dates = self._get_dates_with_rinex(self.canopy_dir)

    # Find intersection
    common = ref_dates & can_dates
    common.discard("00000")  # Remove placeholder directories

    # Sort naturally (numerical order)
    return natsorted(common)

`PairDataDirMatcher` ¶

Match RINEX directories for receiver pairs across dates.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site. Requires a configuration dict specifying receiver locations and analysis pairs.

Parameters¶

base_dir : Path Root directory containing all receiver data receivers : dict Receiver configuration mapping receiver names to their directory paths. The directory value is the full relative path from base_dir to the raw RINEX data directory (before the {YYDOY} date folders). Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"}, "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}} analysis_pairs : dict Analysis pair configuration specifying which receivers to match Example: {"pair_01": {"canopy_receiver": "canopy_01", "reference_receiver": "reference_01"}}

Examples¶

receivers = { ... "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"}, ... "reference_01": {"directory": "01_reference/01_GNSS/01_raw"} ... } pairs = { ... "main_pair": { ... "canopy_receiver": "canopy_01", ... "reference_receiver": "reference_01" ... } ... }

matcher = PairDataDirMatcher( ... base_dir=Path("/data/01_Rosalia"), ... receivers=receivers, ... analysis_pairs=pairs ... )

for matched in matcher: ... print(f"{matched.yyyydoy}: {matched.pair_name}") ... print(f" Canopy: {matched.canopy_data_dir}") ... print(f" Reference: {matched.reference_data_dir}")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

class PairDataDirMatcher:
    """Match RINEX directories for receiver pairs across dates.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site. Requires a configuration dict
    specifying receiver locations and analysis pairs.

    Parameters
    ----------
    base_dir : Path
        Root directory containing all receiver data
    receivers : dict
        Receiver configuration mapping receiver names to their directory paths.
        The ``directory`` value is the full relative path from ``base_dir`` to the
        raw RINEX data directory (before the ``{YYDOY}`` date folders).
        Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"},
                  "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}}
    analysis_pairs : dict
        Analysis pair configuration specifying which receivers to match
        Example: {"pair_01": {"canopy_receiver": "canopy_01",
                               "reference_receiver": "reference_01"}}

    Examples
    --------
    >>> receivers = {
    ...     "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"},
    ...     "reference_01": {"directory": "01_reference/01_GNSS/01_raw"}
    ... }
    >>> pairs = {
    ...     "main_pair": {
    ...         "canopy_receiver": "canopy_01",
    ...         "reference_receiver": "reference_01"
    ...     }
    ... }
    >>>
    >>> matcher = PairDataDirMatcher(
    ...     base_dir=Path("/data/01_Rosalia"),
    ...     receivers=receivers,
    ...     analysis_pairs=pairs
    ... )
    >>>
    >>> for matched in matcher:
    ...     print(f"{matched.yyyydoy}: {matched.pair_name}")
    ...     print(f"  Canopy: {matched.canopy_data_dir}")
    ...     print(f"  Reference: {matched.reference_data_dir}")

    """

    def __init__(
        self,
        base_dir: Path,
        receivers: dict[str, dict[str, str]],
        analysis_pairs: dict[str, dict[str, str]],
    ) -> None:
        """Initialize pair matcher with receiver configuration."""
        import warnings

        warnings.warn(
            "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.base_dir = Path(base_dir)
        self.receivers = receivers
        self.analysis_pairs = analysis_pairs

        # Validate receivers have directory config
        self.receiver_dirs = self._build_receiver_dir_mapping()

    def _build_receiver_dir_mapping(self) -> dict[str, str]:
        """Map receiver names to their directory prefixes.

        Returns
        -------
        dict[str, str]
            Mapping of receiver name to directory path.

        Raises
        ------
        ValueError
            If receiver missing 'directory' in config.

        """
        mapping = {}
        for receiver_name, config in self.receivers.items():
            if "directory" not in config:
                msg = f"Receiver '{receiver_name}' missing 'directory' in config"
                raise ValueError(msg)
            mapping[receiver_name] = config["directory"]
        return mapping

    def _get_receiver_path(self, receiver_name: str, yyyydoy: YYYYDOY) -> Path:
        """Build full path to receiver data for a specific date.

        Parameters
        ----------
        receiver_name : str
            Receiver name (e.g., "canopy_01").
        yyyydoy : YYYYDOY
            Date object.

        Returns
        -------
        Path
            Full path to receiver's RINEX directory for the date.

        """
        receiver_dir = self.receiver_dirs[receiver_name]

        # Convert YYYYDDD to YYDDD format for directory name
        yyddd_str = yyyydoy.yydoy
        if yyddd_str is None:
            msg = f"Missing YYDDD representation for date {yyyydoy}"
            raise ValueError(msg)

        return self.base_dir / receiver_dir / yyddd_str

    def _get_all_dates(self) -> set[YYYYDOY]:
        """Find all dates that have data in any receiver directory.

        Returns
        -------
        set[YYYYDOY]
            Set of all dates with available data.

        """
        all_dates = set()

        for receiver_name in self.receivers:
            receiver_dir = self.receiver_dirs[receiver_name]
            receiver_base = self.base_dir / receiver_dir

            if not receiver_base.exists():
                continue

            # Find all date directories (format: YYDDD - 5 digits)
            for date_dir in receiver_base.iterdir():
                if not date_dir.is_dir():
                    continue

                # Check if directory name is 5 digits
                if len(date_dir.name) != DATE_DIR_LEN or not date_dir.name.isdigit():
                    continue

                # Skip placeholder directories
                if date_dir.name == "00000":
                    continue

                try:
                    yyyydoy = YYYYDOY.from_yydoy_str(date_dir.name)
                    all_dates.add(yyyydoy)
                except ValueError:
                    continue

        return all_dates

    def __iter__(self) -> Iterator[PairMatchedDirs]:
        """Iterate over all date/pair combinations with available data.

        Yields
        ------
        PairMatchedDirs
            Matched directories for a receiver pair on a specific date.

        """
        all_dates = sorted(self._get_all_dates())

        for yyyydoy in all_dates:
            # For each configured analysis pair
            for pair_name, pair_config in self.analysis_pairs.items():
                canopy_rx = pair_config["canopy_receiver"]
                reference_rx = pair_config["reference_receiver"]

                # Build paths for this pair
                canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
                reference_path = self._get_receiver_path(reference_rx, yyyydoy)

                # Check for RINEX files
                canopy_has_files = _has_rinex_files(canopy_path)
                reference_has_files = _has_rinex_files(reference_path)

                # Only yield if both directories exist and have data
                if canopy_has_files and reference_has_files:
                    yield PairMatchedDirs(
                        yyyydoy=yyyydoy,
                        pair_name=pair_name,
                        canopy_receiver=canopy_rx,
                        reference_receiver=reference_rx,
                        canopy_data_dir=canopy_path,
                        reference_data_dir=reference_path,
                    )

`init(base_dir, receivers, analysis_pairs)` ¶

Initialize pair matcher with receiver configuration.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __init__(
    self,
    base_dir: Path,
    receivers: dict[str, dict[str, str]],
    analysis_pairs: dict[str, dict[str, str]],
) -> None:
    """Initialize pair matcher with receiver configuration."""
    import warnings

    warnings.warn(
        "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.base_dir = Path(base_dir)
    self.receivers = receivers
    self.analysis_pairs = analysis_pairs

    # Validate receivers have directory config
    self.receiver_dirs = self._build_receiver_dir_mapping()

`iter()` ¶

Iterate over all date/pair combinations with available data.

Yields¶

PairMatchedDirs Matched directories for a receiver pair on a specific date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py

def __iter__(self) -> Iterator[PairMatchedDirs]:
    """Iterate over all date/pair combinations with available data.

    Yields
    ------
    PairMatchedDirs
        Matched directories for a receiver pair on a specific date.

    """
    all_dates = sorted(self._get_all_dates())

    for yyyydoy in all_dates:
        # For each configured analysis pair
        for pair_name, pair_config in self.analysis_pairs.items():
            canopy_rx = pair_config["canopy_receiver"]
            reference_rx = pair_config["reference_receiver"]

            # Build paths for this pair
            canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
            reference_path = self._get_receiver_path(reference_rx, yyyydoy)

            # Check for RINEX files
            canopy_has_files = _has_rinex_files(canopy_path)
            reference_has_files = _has_rinex_files(reference_path)

            # Only yield if both directories exist and have data
            if canopy_has_files and reference_has_files:
                yield PairMatchedDirs(
                    yyyydoy=yyyydoy,
                    pair_name=pair_name,
                    canopy_receiver=canopy_rx,
                    reference_receiver=reference_rx,
                    canopy_data_dir=canopy_path,
                    reference_data_dir=reference_path,
                )

`MatchedDirs` `dataclass` ¶

Matched directory paths for canopy and reference receivers.

Immutable container representing a pair of directories containing RINEX data for the same date.

Parameters¶

canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference (open-sky) receiver RINEX directory. yyyydoy : YYYYDOY Date object for this matched pair.

Examples¶

from pathlib import Path from canvod.utils.tools import YYYYDOY

md = MatchedDirs( ... canopy_data_dir=Path("/data/02_canopy/25001"), ... reference_data_dir=Path("/data/01_reference/25001"), ... yyyydoy=YYYYDOY.from_str("2025001") ... ) md.yyyydoy.to_str() '2025001'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py

@dataclass(frozen=True)
class MatchedDirs:
    """Matched directory paths for canopy and reference receivers.

    Immutable container representing a pair of directories containing
    RINEX data for the same date.

    Parameters
    ----------
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference (open-sky) receiver RINEX directory.
    yyyydoy : YYYYDOY
        Date object for this matched pair.

    Examples
    --------
    >>> from pathlib import Path
    >>> from canvod.utils.tools import YYYYDOY
    >>>
    >>> md = MatchedDirs(
    ...     canopy_data_dir=Path("/data/02_canopy/25001"),
    ...     reference_data_dir=Path("/data/01_reference/25001"),
    ...     yyyydoy=YYYYDOY.from_str("2025001")
    ... )
    >>> md.yyyydoy.to_str()
    '2025001'

    """

    canopy_data_dir: Path
    reference_data_dir: Path
    yyyydoy: YYYYDOY

`PairMatchedDirs` `dataclass` ¶

Matched directories for a receiver pair on a specific date.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site.

Parameters¶

yyyydoy : YYYYDOY Date for this matched pair. pair_name : str Identifier for this receiver pair (e.g., "pair_01"). canopy_receiver : str Name of canopy receiver (e.g., "canopy_01"). reference_receiver : str Name of reference receiver (e.g., "reference_01"). canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference receiver RINEX directory.

Examples¶

pmd = PairMatchedDirs( ... yyyydoy=YYYYDOY.from_str("2025001"), ... pair_name="pair_01", ... canopy_receiver="canopy_01", ... reference_receiver="reference_01", ... canopy_data_dir=Path("/data/canopy_01/25001"), ... reference_data_dir=Path("/data/reference_01/25001") ... ) pmd.pair_name 'pair_01'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py

@dataclass
class PairMatchedDirs:
    """Matched directories for a receiver pair on a specific date.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site.

    Parameters
    ----------
    yyyydoy : YYYYDOY
        Date for this matched pair.
    pair_name : str
        Identifier for this receiver pair (e.g., "pair_01").
    canopy_receiver : str
        Name of canopy receiver (e.g., "canopy_01").
    reference_receiver : str
        Name of reference receiver (e.g., "reference_01").
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference receiver RINEX directory.

    Examples
    --------
    >>> pmd = PairMatchedDirs(
    ...     yyyydoy=YYYYDOY.from_str("2025001"),
    ...     pair_name="pair_01",
    ...     canopy_receiver="canopy_01",
    ...     reference_receiver="reference_01",
    ...     canopy_data_dir=Path("/data/canopy_01/25001"),
    ...     reference_data_dir=Path("/data/reference_01/25001")
    ... )
    >>> pmd.pair_name
    'pair_01'

    """

    yyyydoy: YYYYDOY
    pair_name: str
    canopy_receiver: str
    reference_receiver: str
    canopy_data_dir: Path
    reference_data_dir: Path

canvod.readers API Reference¶

Package¶

Quick Start¶

GNSSDataReader ¶

Examples¶

source_format property ¶

file_hash abstractmethod property ¶

Returns¶

start_time abstractmethod property ¶

Returns¶

end_time abstractmethod property ¶

Returns¶

systems abstractmethod property ¶

Returns¶

num_epochs property ¶

Returns¶

num_satellites abstractmethod property ¶

Returns¶

to_ds(keep_data_vars=None, **kwargs) abstractmethod ¶

Parameters¶

Returns¶

iter_epochs() abstractmethod ¶

Yields¶

to_ds_and_auxiliary(keep_data_vars=None, **kwargs) ¶

Returns¶

__repr__() ¶

SignalID ¶

system property ¶

sid property ¶

from_string(sid_str) classmethod ¶

Parameters¶

Returns¶

Raises¶

DatasetBuilder ¶

Parameters¶

Examples¶

add_epoch(timestamp) ¶

add_signal(sv, band, code) ¶

set_value(epoch_idx, signal, var, value) ¶

Parameters¶

build(keep_data_vars=None, extra_attrs=None) ¶

Parameters¶

Returns¶

DatasetStructureValidator ¶

Examples¶

validate_all(required_vars=None) ¶

validate_dimensions() ¶

validate_coordinates() ¶

validate_data_variables(required_vars=None) ¶

validate_attributes() ¶

Rnxv3Obs ¶

Attributes¶

Notes¶

header property ¶

Returns¶

file_hash property ¶

Returns¶

start_time property ¶

Returns¶

end_time property ¶

Returns¶

systems property ¶

Returns¶

num_epochs property ¶

Returns¶

num_satellites property ¶

Returns¶

epochs property ¶

Returns¶

__str__() ¶

__repr__() ¶

get_epoch_record_batches(epoch_record_indicator=EPOCH_RECORD_INDICATOR) ¶

Parameters¶

Returns¶

parse_observation_slice(slice_text) ¶

Parameters¶

Returns¶

process_satellite_data(s) ¶

iter_epochs() ¶

Returns¶

`GNSSDataReader` ¶

`source_format` `property` ¶

`file_hash` `abstractmethod` `property` ¶

`start_time` `abstractmethod` `property` ¶

`end_time` `abstractmethod` `property` ¶

`systems` `abstractmethod` `property` ¶

`num_epochs` `property` ¶

`num_satellites` `abstractmethod` `property` ¶

`to_ds(keep_data_vars=None, **kwargs)` `abstractmethod` ¶

`iter_epochs()` `abstractmethod` ¶

`to_ds_and_auxiliary(keep_data_vars=None, **kwargs)` ¶

`repr()` ¶

`SignalID` ¶

`system` `property` ¶

`sid` `property` ¶

`from_string(sid_str)` `classmethod` ¶

`DatasetBuilder` ¶

`add_epoch(timestamp)` ¶

`add_signal(sv, band, code)` ¶

`set_value(epoch_idx, signal, var, value)` ¶

`build(keep_data_vars=None, extra_attrs=None)` ¶

`DatasetStructureValidator` ¶

`validate_all(required_vars=None)` ¶

`validate_dimensions()` ¶

`validate_coordinates()` ¶

`validate_data_variables(required_vars=None)` ¶

`validate_attributes()` ¶

`Rnxv3Obs` ¶

`header` `property` ¶

`file_hash` `property` ¶

`start_time` `property` ¶

`end_time` `property` ¶

`systems` `property` ¶

`num_epochs` `property` ¶

`num_satellites` `property` ¶

`epochs` `property` ¶

`str()` ¶

`repr()` ¶

`get_epoch_record_batches(epoch_record_indicator=EPOCH_RECORD_INDICATOR)` ¶

`parse_observation_slice(slice_text)` ¶

`process_satellite_data(s)` ¶

`iter_epochs()` ¶

`iter_epochs_in_range(start, end)` ¶

`get_datetime_from_epoch_record_info(epoch_record_info)` ¶

`epochrecordinfo_dt_to_numpy_dt(epch)` `staticmethod` ¶

`infer_sampling_interval()` ¶

`infer_dump_interval(sampling_interval=None)` ¶

`validate_epoch_completeness(dump_interval=None, sampling_interval=None)` ¶

`filter_by_overlapping_groups(ds, group_preference=None)` ¶

`create_rinex_netcdf_with_signal_id(start=None, end=None)` ¶

`to_ds(keep_data_vars=None, **kwargs)` ¶

`validate_rinex_304_compliance(ds=None, strict=False, print_report=True)` ¶

`SbfReader` ¶

`file_hash` `cached` `property` ¶

`start_time` `cached` `property` ¶

`end_time` `cached` `property` ¶

`systems` `cached` `property` ¶

`num_satellites` `cached` `property` ¶

`num_epochs` `cached` `property` ¶

`header` `cached` `property` ¶

`iter_epochs()` ¶

`to_ds(keep_data_vars=None, pad_global_sid=True, strip_fillval=True, **kwargs)` ¶

`to_metadata_ds(pad_global_sid=True, **kwargs)` ¶

`to_ds_and_auxiliary(keep_data_vars=None, pad_global_sid=True, strip_fillval=True, store_raw_observables=True, **kwargs)` ¶

`repr()` ¶

`DataDirMatcher` ¶