Skip to content

canvod.readers API Reference

RINEX observation file parsing with validation and GNSS signal specifications.

Package

GNSS data format readers.

This package provides readers for various GNSS data formats, all implementing a common interface for seamless integration with processing pipelines.

Supported formats: - RINEX v3.04 (GNSS observations) - More formats coming soon...

Quick Start

from canvod.readers import Rnxv3Obs

# Read RINEX v3 file
reader = Rnxv3Obs(fpath="station.24o")
dataset = reader.to_ds()

Or use the canvodpy factory for automatic format detection:

from canvodpy import ReaderFactory

# Auto-detects format from file header
reader = ReaderFactory.create_from_file("station.24o")
dataset = reader.to_ds()

Directory Matching:

from canvod.readers import DataDirMatcher

# Find dates with RINEX files in both receivers
matcher = DataDirMatcher(root=Path("/data/01_Rosalia"))
for matched_dirs in matcher:
    print(matched_dirs.yyyydoy)
    # Load RINEX files from matched_dirs.canopy_data_dir

GNSSDataReader

Bases: BaseModel, ABC

Abstract base class for all GNSS data format readers.

All readers must: 1. Inherit from this class 2. Implement all abstract methods 3. Return xarray.Dataset that passes :func:validate_dataset 4. Provide file hash for deduplication

This ensures compatibility with: - canvod-vod: VOD calculation - canvod-store: MyIcechunkStore storage - canvod-grids: Grid projection operations

Subclasses may override model_config to set frozen, extra, etc. The base class provides arbitrary_types_allowed=True which is needed by readers that use pint.Quantity or similar third-party types.

Examples

class Rnxv3Obs(GNSSDataReader): ... def to_ds(self, **kwargs) -> xr.Dataset: ... # Implementation ... return dataset ... reader = Rnxv3Obs(fpath="station.24o") ds = reader.to_ds() validate_dataset(ds)

Source code in packages/canvod-readers/src/canvod/readers/base.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
class GNSSDataReader(BaseModel, ABC):
    """Abstract base class for all GNSS data format readers.

    All readers must:
    1. Inherit from this class
    2. Implement all abstract methods
    3. Return xarray.Dataset that passes :func:`validate_dataset`
    4. Provide file hash for deduplication

    This ensures compatibility with:
    - canvod-vod: VOD calculation
    - canvod-store: MyIcechunkStore storage
    - canvod-grids: Grid projection operations

    Subclasses may override ``model_config`` to set ``frozen``, ``extra``,
    etc.  The base class provides ``arbitrary_types_allowed=True`` which is
    needed by readers that use ``pint.Quantity`` or similar third-party types.

    Examples
    --------
    >>> class Rnxv3Obs(GNSSDataReader):
    ...     def to_ds(self, **kwargs) -> xr.Dataset:
    ...         # Implementation
    ...         return dataset
    ...
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> ds = reader.to_ds()
    >>> validate_dataset(ds)
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    fpath: Path

    @field_validator("fpath")
    @classmethod
    def _validate_fpath(cls, v: Path) -> Path:
        """Validate that the file path points to an existing file."""
        v = Path(v)
        if not v.is_file():
            raise FileNotFoundError(f"File not found: {v}")
        return v

    @property
    def source_format(self) -> str:
        """Return the format identifier for this reader (e.g. ``"rinex3"``, ``"sbf"``)."""
        return "rinex3"

    @property
    @abstractmethod
    def file_hash(self) -> str:
        """Return SHA256 hash of file for deduplication.

        Used by MyIcechunkStore to avoid duplicate ingestion.
        Must be deterministic and reproducible.

        Returns
        -------
        str
            Short hash (16 chars) or full hash of file content
        """

    @abstractmethod
    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert data to xarray.Dataset.

        Must return Dataset with structure:
        - Dims: (epoch, sid)
        - Coords: epoch, sid, sv, system, band, code, freq_*
        - Data vars: At minimum SNR
        - Attrs: Must include "File Hash"

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to include. If None, includes all available.
        **kwargs
            Implementation-specific parameters

        Returns
        -------
        xr.Dataset
            Dataset that passes :func:`validate_dataset`.
        """

    @abstractmethod
    def iter_epochs(self) -> Iterator[object]:
        """Iterate over epochs in the file.

        Yields
        ------
        Epoch
            Parsed epoch with satellites and observations.
        """

    def to_ds_and_auxiliary(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
        """Produce the obs dataset and any auxiliary datasets in a single call.

        Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
        Readers that produce metadata (e.g. SBF) override this to collect both
        in a single file scan.

        Returns
        -------
        tuple[xr.Dataset, dict[str, xr.Dataset]]
            ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
            readers with no extra data (RINEX v2/v3).
        """
        return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

    def _build_attrs(self) -> dict[str, str]:
        """Build standard global attributes for the output Dataset.

        Reads institution/author from config, adds timestamp, version,
        and the file hash.

        Returns
        -------
        dict[str, str]
            Ready-to-use attrs dict.
        """
        from canvod.readers.gnss_specs.metadata import get_global_attrs
        from canvod.readers.gnss_specs.utils import get_version_from_pyproject

        attrs = get_global_attrs()
        attrs["Created"] = datetime.now(UTC).isoformat()
        attrs["Software"] = (
            f"{attrs['Software']}, Version: {get_version_from_pyproject()}"
        )
        attrs["File Hash"] = self.file_hash
        return attrs

    @property
    @abstractmethod
    def start_time(self) -> datetime:
        """Return start time of observations.

        Returns
        -------
        datetime
            First observation timestamp in the file.
        """

    @property
    @abstractmethod
    def end_time(self) -> datetime:
        """Return end time of observations.

        Returns
        -------
        datetime
            Last observation timestamp in the file.
        """

    @property
    @abstractmethod
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'
        """

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Default implementation iterates epochs.  Subclasses may override
        with a faster approach.

        Returns
        -------
        int
            Total number of observation epochs.
        """
        return sum(1 for _ in self.iter_epochs())

    @property
    @abstractmethod
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.
        """

    def __repr__(self) -> str:
        """Return the string representation."""
        return f"{self.__class__.__name__}(file='{self.fpath.name}')"

source_format property

Return the format identifier for this reader (e.g. "rinex3", "sbf").

file_hash abstractmethod property

Return SHA256 hash of file for deduplication.

Used by MyIcechunkStore to avoid duplicate ingestion. Must be deterministic and reproducible.

Returns

str Short hash (16 chars) or full hash of file content

start_time abstractmethod property

Return start time of observations.

Returns

datetime First observation timestamp in the file.

end_time abstractmethod property

Return end time of observations.

Returns

datetime Last observation timestamp in the file.

systems abstractmethod property

Return list of GNSS systems in file.

Returns

list of str System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'

num_epochs property

Return number of epochs in file.

Default implementation iterates epochs. Subclasses may override with a faster approach.

Returns

int Total number of observation epochs.

num_satellites abstractmethod property

Return total number of unique satellites observed.

Returns

int Count of unique satellite vehicles across all systems.

to_ds(keep_data_vars=None, **kwargs) abstractmethod

Convert data to xarray.Dataset.

Must return Dataset with structure: - Dims: (epoch, sid) - Coords: epoch, sid, sv, system, band, code, freq_* - Data vars: At minimum SNR - Attrs: Must include "File Hash"

Parameters

keep_data_vars : list of str, optional Data variables to include. If None, includes all available. **kwargs Implementation-specific parameters

Returns

xr.Dataset Dataset that passes :func:validate_dataset.

Source code in packages/canvod-readers/src/canvod/readers/base.py
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
@abstractmethod
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert data to xarray.Dataset.

    Must return Dataset with structure:
    - Dims: (epoch, sid)
    - Coords: epoch, sid, sv, system, band, code, freq_*
    - Data vars: At minimum SNR
    - Attrs: Must include "File Hash"

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to include. If None, includes all available.
    **kwargs
        Implementation-specific parameters

    Returns
    -------
    xr.Dataset
        Dataset that passes :func:`validate_dataset`.
    """

iter_epochs() abstractmethod

Iterate over epochs in the file.

Yields

Epoch Parsed epoch with satellites and observations.

Source code in packages/canvod-readers/src/canvod/readers/base.py
378
379
380
381
382
383
384
385
386
@abstractmethod
def iter_epochs(self) -> Iterator[object]:
    """Iterate over epochs in the file.

    Yields
    ------
    Epoch
        Parsed epoch with satellites and observations.
    """

to_ds_and_auxiliary(keep_data_vars=None, **kwargs)

Produce the obs dataset and any auxiliary datasets in a single call.

Default: calls to_ds(**kwargs) and returns an empty auxiliary dict. Readers that produce metadata (e.g. SBF) override this to collect both in a single file scan.

Returns

tuple[xr.Dataset, dict[str, xr.Dataset]] (obs_ds, {"name": aux_ds, ...}). Auxiliary dict is empty for readers with no extra data (RINEX v2/v3).

Source code in packages/canvod-readers/src/canvod/readers/base.py
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
def to_ds_and_auxiliary(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
    """Produce the obs dataset and any auxiliary datasets in a single call.

    Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
    Readers that produce metadata (e.g. SBF) override this to collect both
    in a single file scan.

    Returns
    -------
    tuple[xr.Dataset, dict[str, xr.Dataset]]
        ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
        readers with no extra data (RINEX v2/v3).
    """
    return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

__repr__()

Return the string representation.

Source code in packages/canvod-readers/src/canvod/readers/base.py
487
488
489
def __repr__(self) -> str:
    """Return the string representation."""
    return f"{self.__class__.__name__}(file='{self.fpath.name}')"

SignalID

Bases: BaseModel

Validated signal identifier (SV + band + code).

sid = SignalID(sv="G01", band="L1", code="C") str(sid) 'G01|L1|C' sid.system 'G'

Source code in packages/canvod-readers/src/canvod/readers/base.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
class SignalID(BaseModel):
    """Validated signal identifier (SV + band + code).

    >>> sid = SignalID(sv="G01", band="L1", code="C")
    >>> str(sid)
    'G01|L1|C'
    >>> sid.system
    'G'
    """

    model_config = ConfigDict(frozen=True)

    sv: str
    band: str
    code: str

    @field_validator("sv")
    @classmethod
    def _validate_sv(cls, v: str) -> str:
        if not SV_PATTERN.match(v):
            raise ValueError(
                f"Invalid SV: {v!r} — expected system letter + 2-digit PRN "
                f"(e.g. 'G01'). Valid systems: G, R, E, C, J, S, I"
            )
        return v

    @property
    def system(self) -> str:
        """GNSS system letter (e.g. 'G' for GPS)."""
        return self.sv[0]

    @property
    def sid(self) -> str:
        """Full signal ID string ('SV|band|code')."""
        return f"{self.sv}|{self.band}|{self.code}"

    def __str__(self) -> str:
        return self.sid

    def __hash__(self) -> int:
        return hash(self.sid)

    def __eq__(self, other: object) -> bool:
        if isinstance(other, SignalID):
            return self.sid == other.sid
        return NotImplemented

    @classmethod
    def from_string(cls, sid_str: str) -> SignalID:
        """Parse a signal ID string ('SV|band|code') into a SignalID.

        Parameters
        ----------
        sid_str : str
            Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

        Returns
        -------
        SignalID
            Validated signal identifier.

        Raises
        ------
        ValueError
            If the string does not have exactly three pipe-separated parts.
        """
        parts = sid_str.split("|")
        if len(parts) != 3:
            raise ValueError(
                f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
            )
        return cls(sv=parts[0], band=parts[1], code=parts[2])

system property

GNSS system letter (e.g. 'G' for GPS).

sid property

Full signal ID string ('SV|band|code').

from_string(sid_str) classmethod

Parse a signal ID string ('SV|band|code') into a SignalID.

Parameters

sid_str : str Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

Returns

SignalID Validated signal identifier.

Raises

ValueError If the string does not have exactly three pipe-separated parts.

Source code in packages/canvod-readers/src/canvod/readers/base.py
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
@classmethod
def from_string(cls, sid_str: str) -> SignalID:
    """Parse a signal ID string ('SV|band|code') into a SignalID.

    Parameters
    ----------
    sid_str : str
        Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

    Returns
    -------
    SignalID
        Validated signal identifier.

    Raises
    ------
    ValueError
        If the string does not have exactly three pipe-separated parts.
    """
    parts = sid_str.split("|")
    if len(parts) != 3:
        raise ValueError(
            f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
        )
    return cls(sv=parts[0], band=parts[1], code=parts[2])

DatasetBuilder

Guided builder for constructing valid GNSSDataReader output Datasets.

Handles coordinate arrays, dtype enforcement, frequency resolution, and contract validation automatically.

Parameters

reader : GNSSDataReader The reader instance (used for _build_attrs() and file hash). aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels (default True).

Examples

builder = DatasetBuilder(reader) for epoch in reader.iter_epochs(): ... ei = builder.add_epoch(epoch.timestamp) ... for obs in epoch.observations: ... sig = builder.add_signal(sv="G01", band="L1", code="C") ... builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset

Source code in packages/canvod-readers/src/canvod/readers/builder.py
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
class DatasetBuilder:
    """Guided builder for constructing valid GNSSDataReader output Datasets.

    Handles coordinate arrays, dtype enforcement, frequency resolution,
    and contract validation automatically.

    Parameters
    ----------
    reader : GNSSDataReader
        The reader instance (used for ``_build_attrs()`` and file hash).
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels (default True).

    Examples
    --------
    >>> builder = DatasetBuilder(reader)
    >>> for epoch in reader.iter_epochs():
    ...     ei = builder.add_epoch(epoch.timestamp)
    ...     for obs in epoch.observations:
    ...         sig = builder.add_signal(sv="G01", band="L1", code="C")
    ...         builder.set_value(ei, sig, "SNR", 42.0)
    >>> ds = builder.build()   # validated Dataset
    """

    def __init__(
        self,
        reader: GNSSDataReader,
        *,
        aggregate_glonass_fdma: bool = True,
    ) -> None:
        self._reader = reader
        self._mapper = SignalIDMapper(aggregate_glonass_fdma=aggregate_glonass_fdma)
        self._signals: dict[str, SignalID] = {}
        self._epochs: list[datetime] = []
        self._values: dict[str, dict[tuple[int, str], float]] = {}

    def add_epoch(self, timestamp: datetime) -> int:
        """Register an epoch timestamp. Returns epoch index."""
        self._epochs.append(timestamp)
        return len(self._epochs) - 1

    def add_signal(self, sv: str, band: str, code: str) -> SignalID:
        """Register a signal (idempotent). Returns validated SignalID."""
        sig = SignalID(sv=sv, band=band, code=code)
        self._signals[sig.sid] = sig
        return sig

    def set_value(
        self,
        epoch_idx: int,
        signal: SignalID | str,
        var: str,
        value: float,
    ) -> None:
        """Set a data value for a given epoch, signal, and variable.

        Parameters
        ----------
        epoch_idx : int
            Index returned by :meth:`add_epoch`.
        signal : SignalID or str
            Signal identifier (SignalID or 'SV|band|code' string).
        var : str
            Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
        value : float
            The observation value.
        """
        sid = str(signal)
        if var not in self._values:
            self._values[var] = {}
        self._values[var][(epoch_idx, sid)] = value

    def build(
        self,
        keep_data_vars: list[str] | None = None,
        extra_attrs: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Build, validate, and return the Dataset.

        1. Sorts signals alphabetically
        2. Resolves frequencies from band names via SignalIDMapper
        3. Constructs coordinate arrays with correct dtypes (float32 for freq)
        4. Attaches CF-compliant metadata from COORDS_METADATA
        5. Calls validate_dataset() before returning

        Parameters
        ----------
        keep_data_vars : list of str, optional
            If provided, only include these data variables.  If ``None``,
            includes all variables that had values set.
        extra_attrs : dict, optional
            Additional global attributes to merge into the Dataset.

        Returns
        -------
        xr.Dataset
            Validated Dataset with dimensions ``(epoch, sid)``.
        """
        sorted_sids = sorted(self._signals)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(self._epochs)
        n_sids = len(sorted_sids)

        # --- Coordinate arrays ---
        epoch_arr = [
            np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
            for ts in self._epochs
        ]
        sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
        system_arr = np.array(
            [self._signals[s].system for s in sorted_sids], dtype=object
        )
        band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
        code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

        # Frequency resolution via SignalIDMapper
        freq_center = np.array(
            [
                self._mapper.get_band_frequency(self._signals[s].band) or np.nan
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        bandwidths = np.array(
            [
                self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        freq_min = (freq_center - bandwidths / 2).astype(np.float32)
        freq_max = (freq_center + bandwidths / 2).astype(np.float32)

        # --- Determine which variables to include ---
        all_vars = set(self._values.keys())
        if keep_data_vars is not None:
            vars_to_build = [v for v in keep_data_vars if v in all_vars]
        else:
            vars_to_build = sorted(all_vars)

        # --- Data variable arrays ---
        data_vars: dict[str, tuple] = {}
        for var in vars_to_build:
            dtype = DTYPES.get(var, np.dtype("float32"))
            fill = np.nan if np.issubdtype(dtype, np.floating) else -1
            arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

            for (ei, sid_str), val in self._values[var].items():
                if sid_str in sid_to_idx:
                    arr[ei, sid_to_idx[sid_str]] = val

            meta = _VAR_METADATA.get(var, {})
            data_vars[var] = (("epoch", "sid"), arr, meta)

        # --- Coordinates ---
        coords = {
            "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
            "system": ("sid", system_arr, COORDS_METADATA["system"]),
            "band": ("sid", band_arr, COORDS_METADATA["band"]),
            "code": ("sid", code_arr, COORDS_METADATA["code"]),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        # --- Global attributes ---
        attrs = self._reader._build_attrs()
        if extra_attrs:
            attrs.update(extra_attrs)

        ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

        # Validate before returning
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

add_epoch(timestamp)

Register an epoch timestamp. Returns epoch index.

Source code in packages/canvod-readers/src/canvod/readers/builder.py
86
87
88
89
def add_epoch(self, timestamp: datetime) -> int:
    """Register an epoch timestamp. Returns epoch index."""
    self._epochs.append(timestamp)
    return len(self._epochs) - 1

add_signal(sv, band, code)

Register a signal (idempotent). Returns validated SignalID.

Source code in packages/canvod-readers/src/canvod/readers/builder.py
91
92
93
94
95
def add_signal(self, sv: str, band: str, code: str) -> SignalID:
    """Register a signal (idempotent). Returns validated SignalID."""
    sig = SignalID(sv=sv, band=band, code=code)
    self._signals[sig.sid] = sig
    return sig

set_value(epoch_idx, signal, var, value)

Set a data value for a given epoch, signal, and variable.

Parameters

epoch_idx : int Index returned by :meth:add_epoch. signal : SignalID or str Signal identifier (SignalID or 'SV|band|code' string). var : str Variable name (e.g. 'SNR', 'Pseudorange', 'Phase'). value : float The observation value.

Source code in packages/canvod-readers/src/canvod/readers/builder.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def set_value(
    self,
    epoch_idx: int,
    signal: SignalID | str,
    var: str,
    value: float,
) -> None:
    """Set a data value for a given epoch, signal, and variable.

    Parameters
    ----------
    epoch_idx : int
        Index returned by :meth:`add_epoch`.
    signal : SignalID or str
        Signal identifier (SignalID or 'SV|band|code' string).
    var : str
        Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
    value : float
        The observation value.
    """
    sid = str(signal)
    if var not in self._values:
        self._values[var] = {}
    self._values[var][(epoch_idx, sid)] = value

build(keep_data_vars=None, extra_attrs=None)

Build, validate, and return the Dataset.

  1. Sorts signals alphabetically
  2. Resolves frequencies from band names via SignalIDMapper
  3. Constructs coordinate arrays with correct dtypes (float32 for freq)
  4. Attaches CF-compliant metadata from COORDS_METADATA
  5. Calls validate_dataset() before returning
Parameters

keep_data_vars : list of str, optional If provided, only include these data variables. If None, includes all variables that had values set. extra_attrs : dict, optional Additional global attributes to merge into the Dataset.

Returns

xr.Dataset Validated Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/builder.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
def build(
    self,
    keep_data_vars: list[str] | None = None,
    extra_attrs: dict[str, str] | None = None,
) -> xr.Dataset:
    """Build, validate, and return the Dataset.

    1. Sorts signals alphabetically
    2. Resolves frequencies from band names via SignalIDMapper
    3. Constructs coordinate arrays with correct dtypes (float32 for freq)
    4. Attaches CF-compliant metadata from COORDS_METADATA
    5. Calls validate_dataset() before returning

    Parameters
    ----------
    keep_data_vars : list of str, optional
        If provided, only include these data variables.  If ``None``,
        includes all variables that had values set.
    extra_attrs : dict, optional
        Additional global attributes to merge into the Dataset.

    Returns
    -------
    xr.Dataset
        Validated Dataset with dimensions ``(epoch, sid)``.
    """
    sorted_sids = sorted(self._signals)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(self._epochs)
    n_sids = len(sorted_sids)

    # --- Coordinate arrays ---
    epoch_arr = [
        np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
        for ts in self._epochs
    ]
    sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
    system_arr = np.array(
        [self._signals[s].system for s in sorted_sids], dtype=object
    )
    band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
    code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

    # Frequency resolution via SignalIDMapper
    freq_center = np.array(
        [
            self._mapper.get_band_frequency(self._signals[s].band) or np.nan
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    bandwidths = np.array(
        [
            self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    freq_min = (freq_center - bandwidths / 2).astype(np.float32)
    freq_max = (freq_center + bandwidths / 2).astype(np.float32)

    # --- Determine which variables to include ---
    all_vars = set(self._values.keys())
    if keep_data_vars is not None:
        vars_to_build = [v for v in keep_data_vars if v in all_vars]
    else:
        vars_to_build = sorted(all_vars)

    # --- Data variable arrays ---
    data_vars: dict[str, tuple] = {}
    for var in vars_to_build:
        dtype = DTYPES.get(var, np.dtype("float32"))
        fill = np.nan if np.issubdtype(dtype, np.floating) else -1
        arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

        for (ei, sid_str), val in self._values[var].items():
            if sid_str in sid_to_idx:
                arr[ei, sid_to_idx[sid_str]] = val

        meta = _VAR_METADATA.get(var, {})
        data_vars[var] = (("epoch", "sid"), arr, meta)

    # --- Coordinates ---
    coords = {
        "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
        "system": ("sid", system_arr, COORDS_METADATA["system"]),
        "band": ("sid", band_arr, COORDS_METADATA["band"]),
        "code": ("sid", code_arr, COORDS_METADATA["code"]),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    # --- Global attributes ---
    attrs = self._reader._build_attrs()
    if extra_attrs:
        attrs.update(extra_attrs)

    ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

    # Validate before returning
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

DatasetStructureValidator

Bases: BaseModel

Validates that an xarray.Dataset meets the GNSSDataReader contract.

Wraps a Dataset and checks it against the contract constants above. Use this in tests and reader implementations to catch structural errors early with clear messages.

Examples

validator = DatasetStructureValidator(dataset=ds) validator.validate_all() # raises ValueError on any violation validator.validate_dimensions() # check just one aspect

Source code in packages/canvod-readers/src/canvod/readers/base.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
class DatasetStructureValidator(BaseModel):
    """Validates that an xarray.Dataset meets the GNSSDataReader contract.

    Wraps a Dataset and checks it against the contract constants above.
    Use this in tests and reader implementations to catch structural errors
    early with clear messages.

    Examples
    --------
    >>> validator = DatasetStructureValidator(dataset=ds)
    >>> validator.validate_all()          # raises ValueError on any violation
    >>> validator.validate_dimensions()   # check just one aspect
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    dataset: xr.Dataset

    def validate_all(self, required_vars: list[str] | None = None) -> None:
        """Run all validations, collecting **all** errors.

        Delegates to :func:`validate_dataset` so the logic lives in one place.
        """
        validate_dataset(self.dataset, required_vars=required_vars)

    def validate_dimensions(self) -> None:
        """Check that required dimensions (epoch, sid) exist."""
        missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
        if missing:
            raise ValueError(f"Missing required dimensions: {missing}")

    def validate_coordinates(self) -> None:
        """Check that required coordinates exist with correct dtypes."""
        for coord, expected_dtype in REQUIRED_COORDS.items():
            if coord not in self.dataset.coords:
                raise ValueError(f"Missing required coordinate: {coord}")
            actual = str(self.dataset[coord].dtype)
            if expected_dtype == "object":
                is_valid_string = actual == "object" or actual.startswith("StringDType")
                if not is_valid_string:
                    raise ValueError(
                        f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                    )
            elif expected_dtype not in actual:
                raise ValueError(
                    f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
                )

    def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
        """Check that required data variables exist with correct dims."""
        if required_vars is None:
            required_vars = list(DEFAULT_REQUIRED_VARS)
        missing = set(required_vars) - set(self.dataset.data_vars)
        if missing:
            raise ValueError(f"Missing required data variables: {missing}")
        for var in self.dataset.data_vars:
            if self.dataset[var].dims != REQUIRED_DIMS:
                raise ValueError(
                    f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                    f"got {self.dataset[var].dims}"
                )

    def validate_attributes(self) -> None:
        """Check that required global attributes are present."""
        missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
        if missing:
            raise ValueError(f"Missing required attributes: {missing}")

validate_all(required_vars=None)

Run all validations, collecting all errors.

Delegates to :func:validate_dataset so the logic lives in one place.

Source code in packages/canvod-readers/src/canvod/readers/base.py
154
155
156
157
158
159
def validate_all(self, required_vars: list[str] | None = None) -> None:
    """Run all validations, collecting **all** errors.

    Delegates to :func:`validate_dataset` so the logic lives in one place.
    """
    validate_dataset(self.dataset, required_vars=required_vars)

validate_dimensions()

Check that required dimensions (epoch, sid) exist.

Source code in packages/canvod-readers/src/canvod/readers/base.py
161
162
163
164
165
def validate_dimensions(self) -> None:
    """Check that required dimensions (epoch, sid) exist."""
    missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
    if missing:
        raise ValueError(f"Missing required dimensions: {missing}")

validate_coordinates()

Check that required coordinates exist with correct dtypes.

Source code in packages/canvod-readers/src/canvod/readers/base.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def validate_coordinates(self) -> None:
    """Check that required coordinates exist with correct dtypes."""
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in self.dataset.coords:
            raise ValueError(f"Missing required coordinate: {coord}")
        actual = str(self.dataset[coord].dtype)
        if expected_dtype == "object":
            is_valid_string = actual == "object" or actual.startswith("StringDType")
            if not is_valid_string:
                raise ValueError(
                    f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                )
        elif expected_dtype not in actual:
            raise ValueError(
                f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
            )

validate_data_variables(required_vars=None)

Check that required data variables exist with correct dims.

Source code in packages/canvod-readers/src/canvod/readers/base.py
184
185
186
187
188
189
190
191
192
193
194
195
196
def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
    """Check that required data variables exist with correct dims."""
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)
    missing = set(required_vars) - set(self.dataset.data_vars)
    if missing:
        raise ValueError(f"Missing required data variables: {missing}")
    for var in self.dataset.data_vars:
        if self.dataset[var].dims != REQUIRED_DIMS:
            raise ValueError(
                f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                f"got {self.dataset[var].dims}"
            )

validate_attributes()

Check that required global attributes are present.

Source code in packages/canvod-readers/src/canvod/readers/base.py
198
199
200
201
202
def validate_attributes(self) -> None:
    """Check that required global attributes are present."""
    missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
    if missing:
        raise ValueError(f"Missing required attributes: {missing}")

Rnxv3Obs

Bases: GNSSDataReader

RINEX v3.04 observation reader.

Attributes

fpath : Path Path to the RINEX observation file. polarization : str, default "RHCP" Polarization label for observables. completeness_mode : {"strict", "warn", "off"}, default "strict" Behavior when epoch completeness checks fail. expected_dump_interval : str or pint.Quantity, optional Expected file dump interval for completeness validation. expected_sampling_interval : str or pint.Quantity, optional Expected sampling interval for completeness validation. apply_overlap_filter : bool, default False Whether to filter overlapping signal groups. overlap_preferences : dict[str, str], optional Preferred signals for overlap resolution. aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels.

Notes

Inherits fpath, its validator, and arbitrary_types_allowed from :class:GNSSDataReader.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
class Rnxv3Obs(GNSSDataReader):
    """RINEX v3.04 observation reader.

    Attributes
    ----------
    fpath : Path
        Path to the RINEX observation file.
    polarization : str, default "RHCP"
        Polarization label for observables.
    completeness_mode : {"strict", "warn", "off"}, default "strict"
        Behavior when epoch completeness checks fail.
    expected_dump_interval : str or pint.Quantity, optional
        Expected file dump interval for completeness validation.
    expected_sampling_interval : str or pint.Quantity, optional
        Expected sampling interval for completeness validation.
    apply_overlap_filter : bool, default False
        Whether to filter overlapping signal groups.
    overlap_preferences : dict[str, str], optional
        Preferred signals for overlap resolution.
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels.

    Notes
    -----
    Inherits ``fpath``, its validator, and ``arbitrary_types_allowed``
    from :class:`GNSSDataReader`.

    """

    model_config = ConfigDict(frozen=True)

    polarization: str = "RHCP"

    completeness_mode: Literal["strict", "warn", "off"] = "strict"
    expected_dump_interval: str | pint.Quantity | None = None
    expected_sampling_interval: str | pint.Quantity | None = None

    apply_overlap_filter: bool = False
    overlap_preferences: dict[str, str] | None = None

    aggregate_glonass_fdma: bool = True

    _header: Rnxv3Header = PrivateAttr()
    _signal_mapper: SignalIDMapper = PrivateAttr()

    _lines: list[str] = PrivateAttr()
    _file_hash: str = PrivateAttr()
    _cached_epoch_batches: list[tuple[int, int]] | None = PrivateAttr(default=None)

    @model_validator(mode="after")
    def _post_init(self) -> Self:
        """Initialize derived state after validation."""
        # Load header once
        self._header = Rnxv3Header.from_file(self.fpath)

        # Initialize signal mapper
        self._signal_mapper = SignalIDMapper(
            aggregate_glonass_fdma=self.aggregate_glonass_fdma
        )

        # Optionally auto-check completeness
        if self.completeness_mode != "off":
            try:
                self.validate_epoch_completeness(
                    dump_interval=self.expected_dump_interval,
                    sampling_interval=self.expected_sampling_interval,
                )
            except MissingEpochError as e:
                if self.completeness_mode == "strict":
                    raise
                warnings.warn(str(e), RuntimeWarning, stacklevel=2)

        # Cache file lines
        self._lines = self._load_file()

        return self

    @property
    def header(self) -> Rnxv3Header:
        """Expose validated header (read-only).

        Returns
        -------
        Rnxv3Header
            Parsed and validated RINEX header.

        """
        return self._header

    def __str__(self) -> str:
        """Return a human-readable summary."""
        return (
            f"{self.__class__.__name__}:\n"
            f"  File Path: {self.fpath}\n"
            f"  Header: {self.header}\n"
            f"  Polarization: {self.polarization}\n"
        )

    def __repr__(self) -> str:
        """Return a concise representation for debugging."""
        return f"{self.__class__.__name__}(fpath={self.fpath})"

    def _load_file(self) -> list[str]:
        """Read file once, cache lines, and compute hash.

        Returns
        -------
        list[str]
            File contents split into lines.

        """
        if not hasattr(self, "_lines"):
            h = hashlib.sha256()
            with self.fpath.open("rb") as f:  # binary mode for consistent hash
                data = f.read()
                h.update(data)
                self._lines = data.decode("utf-8", errors="replace").splitlines()
            self._file_hash = h.hexdigest()[:16]  # short hash for storage
        return self._lines

    @property
    def file_hash(self) -> str:
        """Return cached SHA256 short hash of the file content.

        Returns
        -------
        str
            16-character short hash for deduplication.

        """
        return self._file_hash

    @property
    def start_time(self) -> datetime:
        """Return start time of observations from header.

        Returns
        -------
        datetime
            First observation timestamp.

        """
        return min(self.header.t0.values())

    @property
    def end_time(self) -> datetime:
        """Return end time of observations from last epoch.

        Returns
        -------
        datetime
            Last observation timestamp.

        """
        last_epoch = None
        for epoch in self.iter_epochs():
            last_epoch = epoch
        if last_epoch:
            return self.get_datetime_from_epoch_record_info(last_epoch.info)
        return self.start_time

    @property
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers (G, R, E, C, J, S, I).

        """
        if self.header.systems == "M":
            return list(self.header.obs_codes_per_system.keys())
        return [self.header.systems]

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Returns
        -------
        int
            Total epoch count.

        """
        return len(list(self.get_epoch_record_batches()))

    @property
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.

        """
        satellites = set()
        for epoch in self.iter_epochs():
            for sat in epoch.data:
                satellites.add(sat.sv)
        return len(satellites)

    def get_epoch_record_batches(
        self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
    ) -> list[tuple[int, int]]:
        """Get the start and end line numbers for each epoch in the file.

        Parameters
        ----------
        epoch_record_indicator : str, default '>'
            Character marking epoch record lines.

        Returns
        -------
        list of tuple of int
            List of (start_line, end_line) pairs for each epoch.

        """
        if self._cached_epoch_batches is not None:
            return self._cached_epoch_batches

        lines = self._load_file()
        starts = [
            i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
        ]
        starts.append(len(lines))  # Add EOF
        self._cached_epoch_batches = [
            (start, starts[i + 1])
            for i, start in enumerate(starts)
            if i + 1 < len(starts)
        ]
        return self._cached_epoch_batches

    def parse_observation_slice(
        self,
        slice_text: str,
    ) -> tuple[float | None, int | None, int | None]:
        """Parse a RINEX observation slice into value, LLI, and SSI.

        Enhanced to handle both standard 16-character format and
        variable-length records.

        Parameters
        ----------
        slice_text : str
            Observation slice to parse.

        Returns
        -------
        tuple[float | None, int | None, int | None]
            Parsed (value, LLI, SSI) tuple.

        """
        if not slice_text or not slice_text.strip():
            return None, None, None

        try:
            # Method 1: Standard RINEX format with decimal at position -6
            if (
                len(slice_text) >= OBS_SLICE_MIN_LEN
                and len(slice_text) <= OBS_SLICE_MAX_LEN
                and slice_text[OBS_SLICE_DECIMAL_POS] == "."
            ):
                slice_chars = list(slice_text)
                ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
                lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

                # Convert LLI and SSI
                lli = int(lli) if lli.strip() and lli.isdigit() else None
                ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

                # Convert value
                value_str = "".join(slice_chars).strip()
                if value_str:
                    value = float(value_str)
                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        try:
            # Method 2: Flexible parsing for variable-length records
            slice_trimmed = slice_text.strip()
            if not slice_trimmed:
                return None, None, None

            # Look for a decimal point to identify the numeric value
            if "." in slice_trimmed:
                # Find the main numeric value (supports negative numbers)
                number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

                if number_match:
                    value = float(number_match.group(1))

                    # Check for LLI/SSI indicators after the number
                    remaining_part = slice_trimmed[number_match.end() :].strip()
                    lli = None
                    ssi = None

                    # Parse remaining characters as potential LLI/SSI
                    if remaining_part:
                        # Could be just SSI, or LLI followed by SSI
                        if len(remaining_part) == 1:
                            # Just one indicator - assume it's SSI
                            if remaining_part.isdigit():
                                ssi = int(remaining_part)
                        elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                            # Two or more characters - take last two as LLI, SSI
                            lli_char = remaining_part[-2]
                            ssi_char = remaining_part[-1]

                            if lli_char.isdigit():
                                lli = int(lli_char)
                            if ssi_char.isdigit():
                                ssi = int(ssi_char)

                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        # Method 3: Last resort - try simple float parsing
        try:
            simple_value = float(slice_text.strip())
            return simple_value, None, None
        except ValueError:
            pass

        return None, None, None

    def process_satellite_data(self, s: str) -> Satellite:
        """Process satellite data line into a Satellite object with observations.

        Handles variable-length observation records correctly by adaptively parsing
        based on the actual line length and content.
        """
        sv = s[:3].strip()
        satellite = Satellite(sv=sv)
        bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

        # Get the data part (after sv identifier)
        data_part = s[3:]

        # Process each observation adaptively
        for i, band in enumerate(bands_tbe):
            start_idx = i * 16
            end_idx = start_idx + 16

            # Check if we have enough data for this observation
            if start_idx >= len(data_part):
                # No more data available - create empty observation
                observation = Observation(
                    obs_type=band.split("|")[1][0],
                    value=None,
                    lli=None,
                    ssi=None,
                )
                satellite.add_observation(observation)
                continue

            # Extract the slice, but handle variable length
            if end_idx <= len(data_part):
                # Full 16-character slice available
                slice_data = data_part[start_idx:end_idx]
            else:
                # Partial slice - pad with spaces to maintain consistency
                available_slice = data_part[start_idx:]
                slice_data = available_slice.ljust(16)  # Pad with spaces if needed

            value, lli, ssi = self.parse_observation_slice(slice_data)

            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=value,
                lli=lli,
                ssi=ssi,
            )
            satellite.add_observation(observation)

        return satellite

    @property
    def epochs(self) -> list[Rnxv3ObsEpochRecord]:
        """Materialize all epochs (legacy compatibility).

        Returns
        -------
        list of Rnxv3ObsEpochRecord
            All epochs in memory (use iter_epochs for efficiency)

        """
        return list(self.iter_epochs())

    def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
        """Yield epochs one by one instead of materializing the whole list.

        Returns
        -------
        Generator
            Generator yielding Rnxv3ObsEpochRecord objects

        Yields
        ------
        Rnxv3ObsEpochRecord
            Each epoch with timestamp and satellite observations

        """
        for start, end in self.get_epoch_record_batches():
            try:
                info = Rnxv3ObsEpochRecordLineModel.model_validate(
                    {"epoch": self._lines[start]}
                )

                # Skip event epochs (flag 2-6: special records, not observations)
                if info.epoch_flag > 1:
                    continue

                # Filter out blank/whitespace-only lines from data slice
                data = [line for line in self._lines[start + 1 : end] if line.strip()]
                epoch = Rnxv3ObsEpochRecord(
                    info=info,
                    data=[self.process_satellite_data(line) for line in data],
                )
                yield epoch
            except InvalidEpochError, IncompleteEpochError, ValueError:
                # Skip epochs with validation errors (invalid SV, malformed data,
                # pydantic ValidationError inherits from ValueError)
                pass

    def iter_epochs_in_range(
        self,
        start: datetime,
        end: datetime,
    ) -> Iterable[Rnxv3ObsEpochRecord]:
        """Yield epochs lazily that fall into the given datetime range.

        Parameters
        ----------
        start : datetime
            Start of time range (inclusive)
        end : datetime
            End of time range (inclusive)

        Returns
        -------
        Generator
            Generator yielding epochs in the specified range

        Yields
        ------
        Rnxv3ObsEpochRecord
            Epochs within the time range

        """
        for epoch in self.iter_epochs():
            dt = self.get_datetime_from_epoch_record_info(epoch.info)
            if start <= dt <= end:
                yield epoch

    def get_datetime_from_epoch_record_info(
        self,
        epoch_record_info: Rnxv3ObsEpochRecordLineModel,
    ) -> datetime:
        """Convert epoch record info to datetime object.

        Parameters
        ----------
        epoch_record_info : Rnxv3ObsEpochRecordLineModel
            Parsed epoch record line

        Returns
        -------
        datetime
            Timestamp from epoch record

        """
        return datetime(
            year=int(epoch_record_info.year),
            month=int(epoch_record_info.month),
            day=int(epoch_record_info.day),
            hour=int(epoch_record_info.hour),
            minute=int(epoch_record_info.minute),
            second=int(epoch_record_info.seconds),
            tzinfo=UTC,
        )

    @staticmethod
    def epochrecordinfo_dt_to_numpy_dt(
        epch: Rnxv3ObsEpochRecord,
    ) -> np.datetime64:
        """Convert Python datetime to numpy datetime64[ns].

        Parameters
        ----------
        epch : Rnxv3ObsEpochRecord
            Epoch record containing timestamp info

        Returns
        -------
        np.datetime64
            Numpy datetime64 with nanosecond precision

        """
        dt = datetime(
            year=int(epch.info.year),
            month=int(epch.info.month),
            day=int(epch.info.day),
            hour=int(epch.info.hour),
            minute=int(epch.info.minute),
            second=int(epch.info.seconds),
            tzinfo=UTC,
        )
        # np.datetime64 doesn't support timezone info, but datetime is already UTC
        # Convert to naive datetime (UTC) to avoid warning
        return np.datetime64(dt.replace(tzinfo=None), "ns")

    def _epoch_datetimes(self) -> list[datetime]:
        """Extract epoch datetimes from the file.

        Uses the same epoch parsing logic already implemented.
        """
        dts: list[datetime] = []

        for start, _end in self.get_epoch_record_batches():
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )
            dts.append(
                datetime(
                    year=int(info.year),
                    month=int(info.month),
                    day=int(info.day),
                    hour=int(info.hour),
                    minute=int(info.minute),
                    second=int(info.seconds),
                    tzinfo=UTC,
                )
            )
        return dts

    def infer_sampling_interval(self) -> pint.Quantity | None:
        """Infer sampling interval from consecutive epoch deltas.

        Returns
        -------
        pint.Quantity or None
            Sampling interval in seconds, or None if cannot be inferred

        """
        dts = self._epoch_datetimes()
        if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
            return None
        # Compute deltas
        deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
        if not deltas:
            return None
        # Pick the most common delta (robust to an occasional missing epoch)
        seconds = Counter(
            int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
        )
        if not seconds:
            return None
        mode_seconds, _ = seconds.most_common(1)[0]
        return (mode_seconds * UREG.second).to(UREG.seconds)

    def infer_dump_interval(
        self, sampling_interval: pint.Quantity | None = None
    ) -> pint.Quantity | None:
        """Infer the intended dump interval for the RINEX file.

        Parameters
        ----------
        sampling_interval : pint.Quantity, optional
            Known sampling interval. If provided, returns (#epochs * sampling_interval)

        Returns
        -------
        pint.Quantity or None
            Dump interval in seconds, or None if cannot be inferred

        """
        idx = self.get_epoch_record_batches()
        n_epochs = len(idx)
        if n_epochs == 0:
            return None

        if sampling_interval is not None:
            return (n_epochs * sampling_interval).to(UREG.seconds)

        # Fallback: time coverage inclusive (last - first) + typical step
        dts = self._epoch_datetimes()
        if len(dts) == 0:
            return None
        if len(dts) == 1:
            # single epoch: treat as 1 * unknown step (cannot infer)
            return None

        # Estimate step from data
        est_step = self.infer_sampling_interval()
        if est_step is None:
            return None

        # Inclusive coverage often equals (n_epochs - 1) * step; intended
        # dump interval is n_epochs * step.
        return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

    def validate_epoch_completeness(
        self,
        dump_interval: str | pint.Quantity | None = None,
        sampling_interval: str | pint.Quantity | None = None,
    ) -> None:
        """Validate that the number of epochs matches the expected dump interval.

        Parameters
        ----------
        dump_interval : str or pint.Quantity, optional
            Expected file dump interval. If None, inferred from epochs.
        sampling_interval : str or pint.Quantity, optional
            Expected sampling interval. If None, inferred from epochs.

        Returns
        -------
        None

        Raises
        ------
        MissingEpochError
            If total sampling time doesn't match dump interval
        ValueError
            If intervals cannot be inferred

        """
        # Normalize/Infer sampling interval
        if sampling_interval is None:
            inferred = self.infer_sampling_interval()
            if inferred is None:
                msg = "Could not infer sampling interval from epochs"
                raise ValueError(msg)
            sampling_interval = inferred
        # normalize to pint
        elif not isinstance(sampling_interval, pint.Quantity):
            sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

        # Normalize/Infer dump interval
        if dump_interval is None:
            inferred_dump = self.infer_dump_interval(
                sampling_interval=sampling_interval
            )
            if inferred_dump is None:
                msg = "Could not infer dump interval from file"
                raise ValueError(msg)
            dump_interval = inferred_dump
        elif not isinstance(dump_interval, pint.Quantity):
            # Accept '15 min', '1h', etc.
            dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

        # Build inputs for the validator model
        epoch_indices = self.get_epoch_record_batches()

        # This throws MissingEpochError automatically if inconsistent
        cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
            epoch_records_indeces=epoch_indices,
            rnx_file_dump_interval=dump_interval,
            sampling_interval=sampling_interval,
        )

    def filter_by_overlapping_groups(
        self,
        ds: xr.Dataset,
        group_preference: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Filter overlapping bands using per-group preferences.

        Parameters
        ----------
        ds : xr.Dataset
            Dataset with `sid` dimension and signal properties.
        group_preference : dict[str, str], optional
            Mapping of overlap group to preferred band.

        Returns
        -------
        xr.Dataset
            Dataset filtered to preferred overlapping bands.

        """
        if group_preference is None:
            group_preference = {
                "L1_E1_B1I": "L1",
                "L5_E5a": "L5",
                "L2_E5b_B2b": "L2",
            }

        keep = []
        for sid in ds.sid.values:
            parts = str(sid).split("|")
            band = parts[1] if len(parts) >= 2 else ""
            group = self._signal_mapper.get_overlapping_group(band)
            if group and group in group_preference:
                if band == group_preference[group]:
                    keep.append(sid)
            else:
                keep.append(sid)
        return ds.sel(sid=keep)

    def _precompute_sids_from_header(
        self,
    ) -> tuple[list[str], dict[str, dict[str, object]]]:
        """Build sorted SID list and properties from header info alone.

        Uses the header's obs_codes_per_system and static constellation
        SV lists to pre-compute the full theoretical SID set, eliminating
        the discovery pass.

        Returns
        -------
        sorted_sids : list[str]
            Sorted list of signal IDs.
        sid_properties : dict[str, dict[str, object]]
            Mapping of SID to its properties (sv, system, band, code,
            freq_center, freq_min, freq_max, bandwidth, overlapping_group).

        """
        mapper = self._signal_mapper
        signal_ids: set[str] = set()
        sid_properties: dict[str, dict[str, object]] = {}

        # Pre-compute pint arithmetic once per unique band
        band_freq_cache: dict[str, tuple[float, float, float, float]] = {}

        for system, obs_codes in self.header.obs_codes_per_system.items():
            svs = _get_constellation_svs(system)

            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]

                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )

                # Cache frequency arithmetic per band
                if band_name not in band_freq_cache:
                    center_frequency = mapper.get_band_frequency(band_name)
                    bandwidth = mapper.get_band_bandwidth(band_name)

                    if center_frequency is not None and bandwidth is not None:
                        bw = bandwidth[0] if isinstance(bandwidth, list) else bandwidth
                        freq_min = center_frequency - (bw / 2.0)
                        freq_max = center_frequency + (bw / 2.0)
                        band_freq_cache[band_name] = (
                            float(center_frequency),
                            float(freq_min),
                            float(freq_max),
                            float(bw),
                        )
                    else:
                        band_freq_cache[band_name] = (
                            np.nan,
                            np.nan,
                            np.nan,
                            np.nan,
                        )

                freq_center, freq_min, freq_max, bw = band_freq_cache[band_name]
                overlapping_group = mapper.get_overlapping_group(band_name)

                sid_suffix = "|" + band_name + "|" + code_char

                for sv in svs:
                    sid = sv + sid_suffix
                    if sid not in signal_ids:
                        signal_ids.add(sid)
                        sid_properties[sid] = {
                            "sv": sv,
                            "system": system,
                            "band": band_name,
                            "code": code_char,
                            "freq_center": freq_center,
                            "freq_min": freq_min,
                            "freq_max": freq_max,
                            "bandwidth": bw,
                            "overlapping_group": overlapping_group,
                        }

        sorted_sids = sorted(signal_ids)
        return sorted_sids, {s: sid_properties[s] for s in sorted_sids}

    def _create_dataset_single_pass(self) -> xr.Dataset:
        """Create xarray Dataset in a single pass over the file.

        Pre-allocates arrays using header-derived SID set and epoch count,
        then fills them by parsing observations inline without Pydantic
        models or function-call overhead.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and standard variables.

        """
        lines = self._load_file()
        epoch_batches = self.get_epoch_record_batches()
        n_epochs = len(epoch_batches)

        sorted_sids, sid_properties = self._precompute_sids_from_header()
        n_sids = len(sorted_sids)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}

        # Pre-allocate arrays
        timestamps = np.empty(n_epochs, dtype="datetime64[ns]")
        snr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pseudo = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        phase = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        doppler = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        lli = np.full((n_epochs, n_sids), -1, dtype=DTYPES["LLI"])
        ssi = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        # Build obs_code → (obs_type, sid_suffix) lookup per system
        mapper = self._signal_mapper
        system_obs_lut: dict[str, list[tuple[str, str]]] = {}
        for system, obs_codes in self.header.obs_codes_per_system.items():
            lut: list[tuple[str, str]] = []
            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    lut.append(("", ""))
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]
                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )
                obs_type = obs_code[0]
                lut.append((obs_type, "|" + band_name + "|" + code_char))
            system_obs_lut[system] = lut

        # Single pass over all epochs — skip unparseable epoch lines
        valid_mask = np.ones(n_epochs, dtype=bool)
        for t_idx, (start, end) in enumerate(epoch_batches):
            epoch_line = lines[start]

            # Inline epoch parsing (no Pydantic model)
            m = _EPOCH_RE.match(epoch_line)
            if m is None:
                valid_mask[t_idx] = False
                continue

            year, month, day = int(m[1]), int(m[2]), int(m[3])
            hour, minute = int(m[4]), int(m[5])
            seconds = float(m[6])
            sec_int = int(seconds)
            usec = int((seconds - sec_int) * 1_000_000)
            ts = np.datetime64(
                f"{year:04d}-{month:02d}-{day:02d}"
                f"T{hour:02d}:{minute:02d}:{sec_int:02d}",
                "ns",
            )
            ts += np.timedelta64(usec, "us")
            timestamps[t_idx] = ts

            # Parse satellite data lines inline
            for line_idx in range(start + 1, end):
                sat_line = lines[line_idx]
                if len(sat_line) < 3:
                    continue
                sv = sat_line[:3].strip()
                if not sv:
                    continue
                system = sv[0]
                lut_list = system_obs_lut.get(system)
                if lut_list is None:
                    continue

                data_part = sat_line[3:]
                data_part_len = len(data_part)

                for i, (obs_type, sid_suffix) in enumerate(lut_list):
                    if not obs_type:
                        continue

                    col_start = i * 16
                    if col_start >= data_part_len:
                        break

                    sid_key = sv + sid_suffix
                    s_idx = sid_to_idx.get(sid_key)
                    if s_idx is None:
                        continue

                    col_end = col_start + 16
                    slice_text = data_part[col_start:col_end]

                    value, obs_lli, obs_ssi = _parse_obs_fast(slice_text)
                    if value is None:
                        continue

                    if obs_type == "S":
                        if value != 0:
                            snr[t_idx, s_idx] = value
                    elif obs_type == "C":
                        pseudo[t_idx, s_idx] = value
                    elif obs_type == "L":
                        phase[t_idx, s_idx] = value
                    elif obs_type == "D":
                        doppler[t_idx, s_idx] = value

                    if obs_lli is not None:
                        lli[t_idx, s_idx] = obs_lli
                    if obs_ssi is not None:
                        ssi[t_idx, s_idx] = obs_ssi

        # Drop epochs that failed to parse
        if not valid_mask.all():
            timestamps = timestamps[valid_mask]
            snr = snr[valid_mask]
            pseudo = pseudo[valid_mask]
            phase = phase[valid_mask]
            doppler = doppler[valid_mask]
            lli = lli[valid_mask]
            ssi = ssi[valid_mask]

        # Build coordinate arrays from pre-computed properties
        sv_list = np.array(
            [sid_properties[sid]["sv"] for sid in sorted_sids], dtype=object
        )
        constellation_list = np.array(
            [sid_properties[sid]["system"] for sid in sorted_sids], dtype=object
        )
        band_list = np.array(
            [sid_properties[sid]["band"] for sid in sorted_sids], dtype=object
        )
        code_list = np.array(
            [sid_properties[sid]["code"] for sid in sorted_sids], dtype=object
        )
        freq_center_list = [sid_properties[sid]["freq_center"] for sid in sorted_sids]
        freq_min_list = [sid_properties[sid]["freq_min"] for sid in sorted_sids]
        freq_max_list = [sid_properties[sid]["freq_max"] for sid in sorted_sids]

        signal_id_coord = xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        )
        coords = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": signal_id_coord,
            "sv": ("sid", sv_list, COORDS_METADATA["sv"]),
            "system": ("sid", constellation_list, COORDS_METADATA["system"]),
            "band": ("sid", band_list, COORDS_METADATA["band"]),
            "code": ("sid", code_list, COORDS_METADATA["code"]),
            "freq_center": (
                "sid",
                np.asarray(freq_center_list, dtype=DTYPES["freq_center"]),
                COORDS_METADATA["freq_center"],
            ),
            "freq_min": (
                "sid",
                np.asarray(freq_min_list, dtype=DTYPES["freq_min"]),
                COORDS_METADATA["freq_min"],
            ),
            "freq_max": (
                "sid",
                np.asarray(freq_max_list, dtype=DTYPES["freq_max"]),
                COORDS_METADATA["freq_max"],
            ),
        }

        if self.header.signal_strength_unit == UREG.dBHz:
            snr_meta = CN0_METADATA
        else:
            snr_meta = SNR_METADATA

        ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr, snr_meta),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pseudo,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (
                    ["epoch", "sid"],
                    phase,
                    OBSERVABLES_METADATA["Phase"],
                ),
                "Doppler": (
                    ["epoch", "sid"],
                    doppler,
                    OBSERVABLES_METADATA["Doppler"],
                ),
                "LLI": (
                    ["epoch", "sid"],
                    lli,
                    OBSERVABLES_METADATA["LLI"],
                ),
                "SSI": (
                    ["epoch", "sid"],
                    ssi,
                    OBSERVABLES_METADATA["SSI"],
                ),
            },
            coords=coords,
            attrs={**self._build_attrs()},
        )

        if self.apply_overlap_filter:
            ds = self.filter_by_overlapping_groups(ds, self.overlap_preferences)

        return ds

    def create_rinex_netcdf_with_signal_id(
        self,
        start: datetime | None = None,
        end: datetime | None = None,
    ) -> xr.Dataset:
        """Create a NetCDF dataset with signal IDs.

        Always uses the fast single-pass path.  Optionally restricts to
        epochs within a datetime range via post-filtering.

        Parameters
        ----------
        start : datetime, optional
            Start of time range (inclusive).
        end : datetime, optional
            End of time range (inclusive).

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid).

        """
        ds = self._create_dataset_single_pass()

        if start or end:
            ds = ds.sel(epoch=slice(start, end))

        return ds

    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert RINEX observations to xarray.Dataset with signal ID structure.

        Parameters
        ----------
        outname : Path or str, optional
            If provided, saves dataset to this file path
        keep_data_vars : list of str or None, optional
            Data variables to include in dataset. Defaults to config value.
        write_global_attrs : bool, default False
            If True, adds comprehensive global attributes
        pad_global_sid : bool, default True
            If True, pads to global signal ID space
        strip_fillval : bool, default True
            If True, removes fill values
        add_future_datavars : bool, default True
            If True, adds placeholder variables for future data
        keep_sids : list of str or None, default None
            If provided, filters/pads dataset to these specific SIDs.
            If None and pad_global_sid=True, pads to all possible SIDs.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and requested data variables

        """
        outname = cast(Path | str | None, kwargs.pop("outname", None))
        write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
        pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
        strip_fillval = bool(kwargs.pop("strip_fillval", True))
        add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
        keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

        if keep_data_vars is None:
            from canvod.utils.config import load_config

            keep_data_vars = load_config().processing.processing.keep_rnx_vars

        ds = self.create_rinex_netcdf_with_signal_id()

        # drop unwanted vars
        for var in list(ds.data_vars):
            if var not in keep_data_vars:
                ds = ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            # Pad/filter to specified sids or all possible sids
            ds = pad_to_global_sid(ds, keep_sids=keep_sids)

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            ds = strip_fillvalue(ds)

        if add_future_datavars:
            pass

        if write_global_attrs:
            ds.attrs.update(self._create_comprehensive_attrs())

        ds.attrs.update(self._build_attrs())

        if outname:
            from canvod.utils.config import load_config as _load_config

            comp = _load_config().processing.compression
            encoding = {
                var: {"zlib": comp.zlib, "complevel": comp.complevel}
                for var in ds.data_vars
            }
            ds.to_netcdf(str(outname), encoding=encoding)

        # Validate output structure for pipeline compatibility
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

    def validate_rinex_304_compliance(
        self,
        ds: xr.Dataset | None = None,
        strict: bool = False,
        print_report: bool = True,
    ) -> dict[str, list[str]]:
        """Run enhanced RINEX 3.04 specification validation.

        Validates:
        1. System-specific observation codes
        2. GLONASS mandatory fields (slot/frequency, biases)
        3. Phase shift records (RINEX 3.01+)
        4. Observation value ranges

        Parameters
        ----------
        ds : xr.Dataset, optional
            Dataset to validate. If None, creates one from current file.
        strict : bool
            If True, raise ValueError on validation failures
        print_report : bool
            If True, print validation report to console

        Returns
        -------
        dict[str, list[str]]
            Validation results by category

        Examples
        --------
        >>> reader = Rnxv3Obs(fpath="station.24o")
        >>> results = reader.validate_rinex_304_compliance()
        >>> # Or validate a specific dataset
        >>> ds = reader.to_ds()
        >>> results = reader.validate_rinex_304_compliance(ds=ds)

        """
        if ds is None:
            ds = self.to_ds(write_global_attrs=False)

        # Prepare header dict for validators
        header_dict: dict[str, Any] = {
            "obs_codes_per_system": self.header.obs_codes_per_system,
        }

        # Add GLONASS-specific headers if available
        if hasattr(self.header, "glonass_slot_frq"):
            header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

        if hasattr(self.header, "glonass_cod_phs_bis"):
            header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

        if hasattr(self.header, "phase_shift"):
            header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

        # Run validation
        results = RINEX304ComplianceValidator.validate_all(
            ds=ds, header_dict=header_dict, strict=strict
        )

        if print_report:
            RINEX304ComplianceValidator.print_validation_report(results)

        return results

    def _create_comprehensive_attrs(self) -> dict[str, object]:
        attrs: dict[str, object] = {
            "File Path": str(self.fpath),
            "File Type": self.header.filetype,
            "RINEX Version": self.header.version,
            "RINEX Type": self.header.rinextype,
            "Observer": self.header.observer,
            "Agency": self.header.agency,
            "Date": self.header.date.isoformat(),
            "Marker Name": self.header.marker_name,
            "Marker Number": self.header.marker_number,
            "Marker Type": self.header.marker_type,
            "Approximate Position": (
                f"(X = {self.header.approx_position[0].magnitude} "
                f"{self.header.approx_position[0].units:~}, "
                f"Y = {self.header.approx_position[1].magnitude} "
                f"{self.header.approx_position[1].units:~}, "
                f"Z = {self.header.approx_position[2].magnitude} "
                f"{self.header.approx_position[2].units:~})"
            ),
            "Receiver Type": self.header.receiver_type,
            "Receiver Version": self.header.receiver_version,
            "Receiver Number": self.header.receiver_number,
            "Antenna Type": self.header.antenna_type,
            "Antenna Number": self.header.antenna_number,
            "Antenna Position": (
                f"(X = {self.header.antenna_position[0].magnitude} "
                f"{self.header.antenna_position[0].units:~}, "
                f"Y = {self.header.antenna_position[1].magnitude} "
                f"{self.header.antenna_position[1].units:~}, "
                f"Z = {self.header.antenna_position[2].magnitude} "
                f"{self.header.antenna_position[2].units:~})"
            ),
            "Program": self.header.pgm,
            "Run By": self.header.run_by,
            "Time of First Observation": json.dumps(
                {k: v.isoformat() for k, v in self.header.t0.items()}
            ),
            "GLONASS COD": self.header.glonass_cod,
            "GLONASS PHS": self.header.glonass_phs,
            "GLONASS BIS": self.header.glonass_bis,
            "GLONASS Slot Frequency Dict": json.dumps(
                self.header.glonass_slot_freq_dict
            ),
            "Leap Seconds": f"{self.header.leap_seconds:~}",
        }
        return attrs

header property

Expose validated header (read-only).

Returns

Rnxv3Header Parsed and validated RINEX header.

file_hash property

Return cached SHA256 short hash of the file content.

Returns

str 16-character short hash for deduplication.

start_time property

Return start time of observations from header.

Returns

datetime First observation timestamp.

end_time property

Return end time of observations from last epoch.

Returns

datetime Last observation timestamp.

systems property

Return list of GNSS systems in file.

Returns

list of str System identifiers (G, R, E, C, J, S, I).

num_epochs property

Return number of epochs in file.

Returns

int Total epoch count.

num_satellites property

Return total number of unique satellites observed.

Returns

int Count of unique satellite vehicles across all systems.

epochs property

Materialize all epochs (legacy compatibility).

Returns

list of Rnxv3ObsEpochRecord All epochs in memory (use iter_epochs for efficiency)

__str__()

Return a human-readable summary.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
808
809
810
811
812
813
814
815
def __str__(self) -> str:
    """Return a human-readable summary."""
    return (
        f"{self.__class__.__name__}:\n"
        f"  File Path: {self.fpath}\n"
        f"  Header: {self.header}\n"
        f"  Polarization: {self.polarization}\n"
    )

__repr__()

Return a concise representation for debugging.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
817
818
819
def __repr__(self) -> str:
    """Return a concise representation for debugging."""
    return f"{self.__class__.__name__}(fpath={self.fpath})"

get_epoch_record_batches(epoch_record_indicator=EPOCH_RECORD_INDICATOR)

Get the start and end line numbers for each epoch in the file.

Parameters

epoch_record_indicator : str, default '>' Character marking epoch record lines.

Returns

list of tuple of int List of (start_line, end_line) pairs for each epoch.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
def get_epoch_record_batches(
    self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
) -> list[tuple[int, int]]:
    """Get the start and end line numbers for each epoch in the file.

    Parameters
    ----------
    epoch_record_indicator : str, default '>'
        Character marking epoch record lines.

    Returns
    -------
    list of tuple of int
        List of (start_line, end_line) pairs for each epoch.

    """
    if self._cached_epoch_batches is not None:
        return self._cached_epoch_batches

    lines = self._load_file()
    starts = [
        i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
    ]
    starts.append(len(lines))  # Add EOF
    self._cached_epoch_batches = [
        (start, starts[i + 1])
        for i, start in enumerate(starts)
        if i + 1 < len(starts)
    ]
    return self._cached_epoch_batches

parse_observation_slice(slice_text)

Parse a RINEX observation slice into value, LLI, and SSI.

Enhanced to handle both standard 16-character format and variable-length records.

Parameters

slice_text : str Observation slice to parse.

Returns

tuple[float | None, int | None, int | None] Parsed (value, LLI, SSI) tuple.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
def parse_observation_slice(
    self,
    slice_text: str,
) -> tuple[float | None, int | None, int | None]:
    """Parse a RINEX observation slice into value, LLI, and SSI.

    Enhanced to handle both standard 16-character format and
    variable-length records.

    Parameters
    ----------
    slice_text : str
        Observation slice to parse.

    Returns
    -------
    tuple[float | None, int | None, int | None]
        Parsed (value, LLI, SSI) tuple.

    """
    if not slice_text or not slice_text.strip():
        return None, None, None

    try:
        # Method 1: Standard RINEX format with decimal at position -6
        if (
            len(slice_text) >= OBS_SLICE_MIN_LEN
            and len(slice_text) <= OBS_SLICE_MAX_LEN
            and slice_text[OBS_SLICE_DECIMAL_POS] == "."
        ):
            slice_chars = list(slice_text)
            ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
            lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

            # Convert LLI and SSI
            lli = int(lli) if lli.strip() and lli.isdigit() else None
            ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

            # Convert value
            value_str = "".join(slice_chars).strip()
            if value_str:
                value = float(value_str)
                return value, lli, ssi

    except ValueError, IndexError:
        pass

    try:
        # Method 2: Flexible parsing for variable-length records
        slice_trimmed = slice_text.strip()
        if not slice_trimmed:
            return None, None, None

        # Look for a decimal point to identify the numeric value
        if "." in slice_trimmed:
            # Find the main numeric value (supports negative numbers)
            number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

            if number_match:
                value = float(number_match.group(1))

                # Check for LLI/SSI indicators after the number
                remaining_part = slice_trimmed[number_match.end() :].strip()
                lli = None
                ssi = None

                # Parse remaining characters as potential LLI/SSI
                if remaining_part:
                    # Could be just SSI, or LLI followed by SSI
                    if len(remaining_part) == 1:
                        # Just one indicator - assume it's SSI
                        if remaining_part.isdigit():
                            ssi = int(remaining_part)
                    elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                        # Two or more characters - take last two as LLI, SSI
                        lli_char = remaining_part[-2]
                        ssi_char = remaining_part[-1]

                        if lli_char.isdigit():
                            lli = int(lli_char)
                        if ssi_char.isdigit():
                            ssi = int(ssi_char)

                return value, lli, ssi

    except ValueError, IndexError:
        pass

    # Method 3: Last resort - try simple float parsing
    try:
        simple_value = float(slice_text.strip())
        return simple_value, None, None
    except ValueError:
        pass

    return None, None, None

process_satellite_data(s)

Process satellite data line into a Satellite object with observations.

Handles variable-length observation records correctly by adaptively parsing based on the actual line length and content.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
def process_satellite_data(self, s: str) -> Satellite:
    """Process satellite data line into a Satellite object with observations.

    Handles variable-length observation records correctly by adaptively parsing
    based on the actual line length and content.
    """
    sv = s[:3].strip()
    satellite = Satellite(sv=sv)
    bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

    # Get the data part (after sv identifier)
    data_part = s[3:]

    # Process each observation adaptively
    for i, band in enumerate(bands_tbe):
        start_idx = i * 16
        end_idx = start_idx + 16

        # Check if we have enough data for this observation
        if start_idx >= len(data_part):
            # No more data available - create empty observation
            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=None,
                lli=None,
                ssi=None,
            )
            satellite.add_observation(observation)
            continue

        # Extract the slice, but handle variable length
        if end_idx <= len(data_part):
            # Full 16-character slice available
            slice_data = data_part[start_idx:end_idx]
        else:
            # Partial slice - pad with spaces to maintain consistency
            available_slice = data_part[start_idx:]
            slice_data = available_slice.ljust(16)  # Pad with spaces if needed

        value, lli, ssi = self.parse_observation_slice(slice_data)

        observation = Observation(
            obs_type=band.split("|")[1][0],
            value=value,
            lli=lli,
            ssi=ssi,
        )
        satellite.add_observation(observation)

    return satellite

iter_epochs()

Yield epochs one by one instead of materializing the whole list.

Returns

Generator Generator yielding Rnxv3ObsEpochRecord objects

Yields

Rnxv3ObsEpochRecord Each epoch with timestamp and satellite observations

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
    """Yield epochs one by one instead of materializing the whole list.

    Returns
    -------
    Generator
        Generator yielding Rnxv3ObsEpochRecord objects

    Yields
    ------
    Rnxv3ObsEpochRecord
        Each epoch with timestamp and satellite observations

    """
    for start, end in self.get_epoch_record_batches():
        try:
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )

            # Skip event epochs (flag 2-6: special records, not observations)
            if info.epoch_flag > 1:
                continue

            # Filter out blank/whitespace-only lines from data slice
            data = [line for line in self._lines[start + 1 : end] if line.strip()]
            epoch = Rnxv3ObsEpochRecord(
                info=info,
                data=[self.process_satellite_data(line) for line in data],
            )
            yield epoch
        except InvalidEpochError, IncompleteEpochError, ValueError:
            # Skip epochs with validation errors (invalid SV, malformed data,
            # pydantic ValidationError inherits from ValueError)
            pass

iter_epochs_in_range(start, end)

Yield epochs lazily that fall into the given datetime range.

Parameters

start : datetime Start of time range (inclusive) end : datetime End of time range (inclusive)

Returns

Generator Generator yielding epochs in the specified range

Yields

Rnxv3ObsEpochRecord Epochs within the time range

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
def iter_epochs_in_range(
    self,
    start: datetime,
    end: datetime,
) -> Iterable[Rnxv3ObsEpochRecord]:
    """Yield epochs lazily that fall into the given datetime range.

    Parameters
    ----------
    start : datetime
        Start of time range (inclusive)
    end : datetime
        End of time range (inclusive)

    Returns
    -------
    Generator
        Generator yielding epochs in the specified range

    Yields
    ------
    Rnxv3ObsEpochRecord
        Epochs within the time range

    """
    for epoch in self.iter_epochs():
        dt = self.get_datetime_from_epoch_record_info(epoch.info)
        if start <= dt <= end:
            yield epoch

get_datetime_from_epoch_record_info(epoch_record_info)

Convert epoch record info to datetime object.

Parameters

epoch_record_info : Rnxv3ObsEpochRecordLineModel Parsed epoch record line

Returns

datetime Timestamp from epoch record

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
def get_datetime_from_epoch_record_info(
    self,
    epoch_record_info: Rnxv3ObsEpochRecordLineModel,
) -> datetime:
    """Convert epoch record info to datetime object.

    Parameters
    ----------
    epoch_record_info : Rnxv3ObsEpochRecordLineModel
        Parsed epoch record line

    Returns
    -------
    datetime
        Timestamp from epoch record

    """
    return datetime(
        year=int(epoch_record_info.year),
        month=int(epoch_record_info.month),
        day=int(epoch_record_info.day),
        hour=int(epoch_record_info.hour),
        minute=int(epoch_record_info.minute),
        second=int(epoch_record_info.seconds),
        tzinfo=UTC,
    )

epochrecordinfo_dt_to_numpy_dt(epch) staticmethod

Convert Python datetime to numpy datetime64[ns].

Parameters

epch : Rnxv3ObsEpochRecord Epoch record containing timestamp info

Returns

np.datetime64 Numpy datetime64 with nanosecond precision

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
@staticmethod
def epochrecordinfo_dt_to_numpy_dt(
    epch: Rnxv3ObsEpochRecord,
) -> np.datetime64:
    """Convert Python datetime to numpy datetime64[ns].

    Parameters
    ----------
    epch : Rnxv3ObsEpochRecord
        Epoch record containing timestamp info

    Returns
    -------
    np.datetime64
        Numpy datetime64 with nanosecond precision

    """
    dt = datetime(
        year=int(epch.info.year),
        month=int(epch.info.month),
        day=int(epch.info.day),
        hour=int(epch.info.hour),
        minute=int(epch.info.minute),
        second=int(epch.info.seconds),
        tzinfo=UTC,
    )
    # np.datetime64 doesn't support timezone info, but datetime is already UTC
    # Convert to naive datetime (UTC) to avoid warning
    return np.datetime64(dt.replace(tzinfo=None), "ns")

infer_sampling_interval()

Infer sampling interval from consecutive epoch deltas.

Returns

pint.Quantity or None Sampling interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
def infer_sampling_interval(self) -> pint.Quantity | None:
    """Infer sampling interval from consecutive epoch deltas.

    Returns
    -------
    pint.Quantity or None
        Sampling interval in seconds, or None if cannot be inferred

    """
    dts = self._epoch_datetimes()
    if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
        return None
    # Compute deltas
    deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
    if not deltas:
        return None
    # Pick the most common delta (robust to an occasional missing epoch)
    seconds = Counter(
        int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
    )
    if not seconds:
        return None
    mode_seconds, _ = seconds.most_common(1)[0]
    return (mode_seconds * UREG.second).to(UREG.seconds)

infer_dump_interval(sampling_interval=None)

Infer the intended dump interval for the RINEX file.

Parameters

sampling_interval : pint.Quantity, optional Known sampling interval. If provided, returns (#epochs * sampling_interval)

Returns

pint.Quantity or None Dump interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
def infer_dump_interval(
    self, sampling_interval: pint.Quantity | None = None
) -> pint.Quantity | None:
    """Infer the intended dump interval for the RINEX file.

    Parameters
    ----------
    sampling_interval : pint.Quantity, optional
        Known sampling interval. If provided, returns (#epochs * sampling_interval)

    Returns
    -------
    pint.Quantity or None
        Dump interval in seconds, or None if cannot be inferred

    """
    idx = self.get_epoch_record_batches()
    n_epochs = len(idx)
    if n_epochs == 0:
        return None

    if sampling_interval is not None:
        return (n_epochs * sampling_interval).to(UREG.seconds)

    # Fallback: time coverage inclusive (last - first) + typical step
    dts = self._epoch_datetimes()
    if len(dts) == 0:
        return None
    if len(dts) == 1:
        # single epoch: treat as 1 * unknown step (cannot infer)
        return None

    # Estimate step from data
    est_step = self.infer_sampling_interval()
    if est_step is None:
        return None

    # Inclusive coverage often equals (n_epochs - 1) * step; intended
    # dump interval is n_epochs * step.
    return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

validate_epoch_completeness(dump_interval=None, sampling_interval=None)

Validate that the number of epochs matches the expected dump interval.

Parameters

dump_interval : str or pint.Quantity, optional Expected file dump interval. If None, inferred from epochs. sampling_interval : str or pint.Quantity, optional Expected sampling interval. If None, inferred from epochs.

Returns

None

Raises

MissingEpochError If total sampling time doesn't match dump interval ValueError If intervals cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
def validate_epoch_completeness(
    self,
    dump_interval: str | pint.Quantity | None = None,
    sampling_interval: str | pint.Quantity | None = None,
) -> None:
    """Validate that the number of epochs matches the expected dump interval.

    Parameters
    ----------
    dump_interval : str or pint.Quantity, optional
        Expected file dump interval. If None, inferred from epochs.
    sampling_interval : str or pint.Quantity, optional
        Expected sampling interval. If None, inferred from epochs.

    Returns
    -------
    None

    Raises
    ------
    MissingEpochError
        If total sampling time doesn't match dump interval
    ValueError
        If intervals cannot be inferred

    """
    # Normalize/Infer sampling interval
    if sampling_interval is None:
        inferred = self.infer_sampling_interval()
        if inferred is None:
            msg = "Could not infer sampling interval from epochs"
            raise ValueError(msg)
        sampling_interval = inferred
    # normalize to pint
    elif not isinstance(sampling_interval, pint.Quantity):
        sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

    # Normalize/Infer dump interval
    if dump_interval is None:
        inferred_dump = self.infer_dump_interval(
            sampling_interval=sampling_interval
        )
        if inferred_dump is None:
            msg = "Could not infer dump interval from file"
            raise ValueError(msg)
        dump_interval = inferred_dump
    elif not isinstance(dump_interval, pint.Quantity):
        # Accept '15 min', '1h', etc.
        dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

    # Build inputs for the validator model
    epoch_indices = self.get_epoch_record_batches()

    # This throws MissingEpochError automatically if inconsistent
    cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
        epoch_records_indeces=epoch_indices,
        rnx_file_dump_interval=dump_interval,
        sampling_interval=sampling_interval,
    )

filter_by_overlapping_groups(ds, group_preference=None)

Filter overlapping bands using per-group preferences.

Parameters

ds : xr.Dataset Dataset with sid dimension and signal properties. group_preference : dict[str, str], optional Mapping of overlap group to preferred band.

Returns

xr.Dataset Dataset filtered to preferred overlapping bands.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
def filter_by_overlapping_groups(
    self,
    ds: xr.Dataset,
    group_preference: dict[str, str] | None = None,
) -> xr.Dataset:
    """Filter overlapping bands using per-group preferences.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset with `sid` dimension and signal properties.
    group_preference : dict[str, str], optional
        Mapping of overlap group to preferred band.

    Returns
    -------
    xr.Dataset
        Dataset filtered to preferred overlapping bands.

    """
    if group_preference is None:
        group_preference = {
            "L1_E1_B1I": "L1",
            "L5_E5a": "L5",
            "L2_E5b_B2b": "L2",
        }

    keep = []
    for sid in ds.sid.values:
        parts = str(sid).split("|")
        band = parts[1] if len(parts) >= 2 else ""
        group = self._signal_mapper.get_overlapping_group(band)
        if group and group in group_preference:
            if band == group_preference[group]:
                keep.append(sid)
        else:
            keep.append(sid)
    return ds.sel(sid=keep)

create_rinex_netcdf_with_signal_id(start=None, end=None)

Create a NetCDF dataset with signal IDs.

Always uses the fast single-pass path. Optionally restricts to epochs within a datetime range via post-filtering.

Parameters

start : datetime, optional Start of time range (inclusive). end : datetime, optional End of time range (inclusive).

Returns

xr.Dataset Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
def create_rinex_netcdf_with_signal_id(
    self,
    start: datetime | None = None,
    end: datetime | None = None,
) -> xr.Dataset:
    """Create a NetCDF dataset with signal IDs.

    Always uses the fast single-pass path.  Optionally restricts to
    epochs within a datetime range via post-filtering.

    Parameters
    ----------
    start : datetime, optional
        Start of time range (inclusive).
    end : datetime, optional
        End of time range (inclusive).

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid).

    """
    ds = self._create_dataset_single_pass()

    if start or end:
        ds = ds.sel(epoch=slice(start, end))

    return ds

to_ds(keep_data_vars=None, **kwargs)

Convert RINEX observations to xarray.Dataset with signal ID structure.

Parameters

outname : Path or str, optional If provided, saves dataset to this file path keep_data_vars : list of str or None, optional Data variables to include in dataset. Defaults to config value. write_global_attrs : bool, default False If True, adds comprehensive global attributes pad_global_sid : bool, default True If True, pads to global signal ID space strip_fillval : bool, default True If True, removes fill values add_future_datavars : bool, default True If True, adds placeholder variables for future data keep_sids : list of str or None, default None If provided, filters/pads dataset to these specific SIDs. If None and pad_global_sid=True, pads to all possible SIDs.

Returns

xr.Dataset Dataset with dimensions (epoch, sid) and requested data variables

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert RINEX observations to xarray.Dataset with signal ID structure.

    Parameters
    ----------
    outname : Path or str, optional
        If provided, saves dataset to this file path
    keep_data_vars : list of str or None, optional
        Data variables to include in dataset. Defaults to config value.
    write_global_attrs : bool, default False
        If True, adds comprehensive global attributes
    pad_global_sid : bool, default True
        If True, pads to global signal ID space
    strip_fillval : bool, default True
        If True, removes fill values
    add_future_datavars : bool, default True
        If True, adds placeholder variables for future data
    keep_sids : list of str or None, default None
        If provided, filters/pads dataset to these specific SIDs.
        If None and pad_global_sid=True, pads to all possible SIDs.

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid) and requested data variables

    """
    outname = cast(Path | str | None, kwargs.pop("outname", None))
    write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
    pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
    strip_fillval = bool(kwargs.pop("strip_fillval", True))
    add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
    keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

    if keep_data_vars is None:
        from canvod.utils.config import load_config

        keep_data_vars = load_config().processing.processing.keep_rnx_vars

    ds = self.create_rinex_netcdf_with_signal_id()

    # drop unwanted vars
    for var in list(ds.data_vars):
        if var not in keep_data_vars:
            ds = ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        # Pad/filter to specified sids or all possible sids
        ds = pad_to_global_sid(ds, keep_sids=keep_sids)

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        ds = strip_fillvalue(ds)

    if add_future_datavars:
        pass

    if write_global_attrs:
        ds.attrs.update(self._create_comprehensive_attrs())

    ds.attrs.update(self._build_attrs())

    if outname:
        from canvod.utils.config import load_config as _load_config

        comp = _load_config().processing.compression
        encoding = {
            var: {"zlib": comp.zlib, "complevel": comp.complevel}
            for var in ds.data_vars
        }
        ds.to_netcdf(str(outname), encoding=encoding)

    # Validate output structure for pipeline compatibility
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

validate_rinex_304_compliance(ds=None, strict=False, print_report=True)

Run enhanced RINEX 3.04 specification validation.

Validates: 1. System-specific observation codes 2. GLONASS mandatory fields (slot/frequency, biases) 3. Phase shift records (RINEX 3.01+) 4. Observation value ranges

Parameters

ds : xr.Dataset, optional Dataset to validate. If None, creates one from current file. strict : bool If True, raise ValueError on validation failures print_report : bool If True, print validation report to console

Returns

dict[str, list[str]] Validation results by category

Examples

reader = Rnxv3Obs(fpath="station.24o") results = reader.validate_rinex_304_compliance()

Or validate a specific dataset

ds = reader.to_ds() results = reader.validate_rinex_304_compliance(ds=ds)

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
def validate_rinex_304_compliance(
    self,
    ds: xr.Dataset | None = None,
    strict: bool = False,
    print_report: bool = True,
) -> dict[str, list[str]]:
    """Run enhanced RINEX 3.04 specification validation.

    Validates:
    1. System-specific observation codes
    2. GLONASS mandatory fields (slot/frequency, biases)
    3. Phase shift records (RINEX 3.01+)
    4. Observation value ranges

    Parameters
    ----------
    ds : xr.Dataset, optional
        Dataset to validate. If None, creates one from current file.
    strict : bool
        If True, raise ValueError on validation failures
    print_report : bool
        If True, print validation report to console

    Returns
    -------
    dict[str, list[str]]
        Validation results by category

    Examples
    --------
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> results = reader.validate_rinex_304_compliance()
    >>> # Or validate a specific dataset
    >>> ds = reader.to_ds()
    >>> results = reader.validate_rinex_304_compliance(ds=ds)

    """
    if ds is None:
        ds = self.to_ds(write_global_attrs=False)

    # Prepare header dict for validators
    header_dict: dict[str, Any] = {
        "obs_codes_per_system": self.header.obs_codes_per_system,
    }

    # Add GLONASS-specific headers if available
    if hasattr(self.header, "glonass_slot_frq"):
        header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

    if hasattr(self.header, "glonass_cod_phs_bis"):
        header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

    if hasattr(self.header, "phase_shift"):
        header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

    # Run validation
    results = RINEX304ComplianceValidator.validate_all(
        ds=ds, header_dict=header_dict, strict=strict
    )

    if print_report:
        RINEX304ComplianceValidator.print_validation_report(results)

    return results

SbfReader

Bases: GNSSDataReader

Read and decode a Septentrio Binary Format (SBF) observation file.

Parameters

fpath : Path Path to the *.sbf (or *.SBF, or receiver-named) binary file.

Examples

reader = SbfReader(fpath=Path("rref213a00.25_")) print(reader.header.rx_version) 4.14.4 for epoch in reader.iter_epochs(): ... for obs in epoch.observations: ... print(obs.system, obs.prn, obs.cn0)

Notes

  • All physical-unit conversions follow RefGuide-4.14.0.
  • Physical quantities are expressed as :class:pint.Quantity objects using the shared :data:~canvod.readers.gnss_specs.constants.UREG.
  • GLONASS FDMA frequencies are resolved from the most recently seen ChannelStatus block; observations before the first ChannelStatus for a given SVID have phase_cycles=None.
  • The file is scanned once per :meth:iter_epochs call; use :attr:num_epochs for a pre-computed count (scans once on first access).
  • Inherits fpath, its validator, and arbitrary_types_allowed from :class:GNSSDataReader.
Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
class SbfReader(GNSSDataReader):
    """Read and decode a Septentrio Binary Format (SBF) observation file.

    Parameters
    ----------
    fpath : Path
        Path to the ``*.sbf`` (or ``*.SBF``, or receiver-named) binary file.

    Examples
    --------
    >>> reader = SbfReader(fpath=Path("rref213a00.25_"))
    >>> print(reader.header.rx_version)
    4.14.4
    >>> for epoch in reader.iter_epochs():
    ...     for obs in epoch.observations:
    ...         print(obs.system, obs.prn, obs.cn0)

    Notes
    -----
    - All physical-unit conversions follow RefGuide-4.14.0.
    - Physical quantities are expressed as :class:`pint.Quantity` objects
      using the shared :data:`~canvod.readers.gnss_specs.constants.UREG`.
    - GLONASS FDMA frequencies are resolved from the most recently seen
      ChannelStatus block; observations before the first ChannelStatus for a
      given SVID have ``phase_cycles=None``.
    - The file is scanned once per :meth:`iter_epochs` call; use
      :attr:`num_epochs` for a pre-computed count (scans once on first access).
    - Inherits ``fpath``, its validator, and ``arbitrary_types_allowed``
      from :class:`GNSSDataReader`.
    """

    model_config = ConfigDict(extra="ignore")

    @property
    def source_format(self) -> str:
        return "sbf"

    # ------------------------------------------------------------------
    # Pre-scan caches
    # ------------------------------------------------------------------

    @cached_property
    def _freq_nr_cache(self) -> dict[int, int]:
        """Pre-scan ALL ChannelStatus blocks to build a complete SVID → FreqNr map.

        Scanning the entire file once means early GLONASS epochs also have
        accurate FDMA frequency assignments in :meth:`iter_epochs`.

        Returns
        -------
        dict of {int: int}
            Mapping from Septentrio SVID to GLONASS frequency slot number.
        """
        parser = sbf_parser.SbfParser()
        cache: dict[int, int] = {}
        for name, data in parser.read(str(self.fpath)):
            if name == "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid = int(sat["SVID"])
                    if svid != 0:
                        cache[svid] = int(sat["FreqNr"])
        return cache

    # ------------------------------------------------------------------
    # GNSSDataReader abstract property implementations
    # ------------------------------------------------------------------

    @cached_property
    def file_hash(self) -> str:
        """SHA-256 hex digest of the file (first 16 characters).

        Returns
        -------
        str
            16-character hexadecimal prefix of the SHA-256 hash.
        """
        h = hashlib.sha256(self.fpath.read_bytes())
        return h.hexdigest()[:16]

    @cached_property
    def start_time(self) -> datetime:
        """Return the timestamp of the first decoded epoch.

        Returns
        -------
        datetime
            Timezone-aware UTC datetime of the first observation epoch.

        Raises
        ------
        LookupError
            If the file contains no decodable epochs.
        """
        for epoch in self.iter_epochs():
            return epoch.timestamp
        raise LookupError(f"No epochs in {self.fpath}")

    @cached_property
    def end_time(self) -> datetime:
        """Return the timestamp of the last decoded epoch.

        Returns
        -------
        datetime
            Timezone-aware UTC datetime of the last observation epoch.

        Raises
        ------
        LookupError
            If the file contains no decodable epochs.
        """
        last: datetime | None = None
        for epoch in self.iter_epochs():
            last = epoch.timestamp
        if last is None:
            raise LookupError(f"No epochs in {self.fpath}")
        return last

    @cached_property
    def systems(self) -> list[str]:
        """Return sorted list of GNSS system codes present in the file.

        Returns
        -------
        list of str
            Sorted list of RINEX system letters (e.g. ``["E", "G", "R"]``).
        """
        return sorted(
            {obs.system for ep in self.iter_epochs() for obs in ep.observations}
        )

    @cached_property
    def num_satellites(self) -> int:
        """Return the number of unique satellites observed in the file.

        Returns
        -------
        int
            Count of unique ``system + PRN`` pairs across all epochs.
        """
        return len(
            {
                f"{obs.system}{obs.prn:02d}"
                for ep in self.iter_epochs()
                for obs in ep.observations
            }
        )

    # ------------------------------------------------------------------
    # Epoch count (existing cached property — kept for backward compat)
    # ------------------------------------------------------------------

    @cached_property
    def num_epochs(self) -> int:
        """Count the number of MeasEpoch blocks in the file.

        Returns
        -------
        int
            Total MeasEpoch block count (one per observation epoch).

        Notes
        -----
        Scans the entire file once; result is cached.
        """
        parser = sbf_parser.SbfParser()
        count = sum(
            1 for name, _ in parser.read(str(self.fpath)) if name == "MeasEpoch"
        )
        log.debug("sbf_epoch_count", fpath=str(self.fpath), num_epochs=count)
        return count

    # ------------------------------------------------------------------
    # Header
    # ------------------------------------------------------------------

    @cached_property
    def header(self) -> SbfHeader:
        """Parse the first ReceiverSetup block in the file.

        Returns
        -------
        SbfHeader
            Receiver metadata.

        Raises
        ------
        LookupError
            If no ReceiverSetup block is found.
        """
        parser = sbf_parser.SbfParser()
        for name, data in parser.read(str(self.fpath)):
            if name == "ReceiverSetup":
                return SbfHeader(
                    marker_name=_decode_bytes(data["MarkerName"]),
                    marker_number=_decode_bytes(data["MarkerNumber"]),
                    observer=_decode_bytes(data["Observer"]),
                    agency=_decode_bytes(data["Agency"]),
                    rx_serial=_decode_bytes(data["RxSerialNumber"]),
                    rx_name=_decode_bytes(data["RxName"]),
                    rx_version=_decode_bytes(data["RxVersion"]),
                    ant_serial=_decode_bytes(data["AntSerialNbr"]),
                    ant_type=_decode_bytes(data["AntType"]),
                    delta_h=float(data["deltaH"]) * UREG.meter,
                    delta_e=float(data["deltaE"]) * UREG.meter,
                    delta_n=float(data["deltaN"]) * UREG.meter,
                    latitude_rad=float(data["Latitude"]),
                    longitude_rad=float(data["Longitude"]),
                    height_m=float(data["Height"]) * UREG.meter,
                    gnss_fw_version=_decode_bytes(data["GNSSFWVersion"]),
                    product_name=_decode_bytes(data["ProductName"]),
                )
        raise LookupError(f"No ReceiverSetup block found in {self.fpath}")

    # ------------------------------------------------------------------
    # Epoch iterator
    # ------------------------------------------------------------------

    def iter_epochs(self) -> Iterator[SbfEpoch]:
        """Iterate over decoded MeasEpoch blocks.

        Yields decoded :class:`SbfEpoch` objects with all signal observations
        converted to physical units as :class:`pint.Quantity`.

        Yields
        ------
        SbfEpoch
            One decoded observation epoch.

        Notes
        -----
        - The file is scanned from start to finish on each call.
        - The :attr:`_freq_nr_cache` is pre-populated from ALL ChannelStatus
          blocks before the first call, so all GLONASS FDMA epochs have
          accurate carrier frequencies.
        - ``delta_ls`` (leap seconds) is taken from the most recent
          ReceiverTime block; defaults to 18 if none has been seen yet.
        """
        parser = sbf_parser.SbfParser()
        freq_nr_cache: dict[int, int] = self._freq_nr_cache.copy()
        delta_ls: int = _DEFAULT_DELTA_LS

        for name, data in parser.read(str(self.fpath)):
            match name:
                case "ReceiverTime":
                    delta_ls = int(data["DeltaLS"])

                case "ChannelStatus":
                    for sat in data.get("ChannelSatInfo", []):
                        svid = int(sat["SVID"])
                        if svid != 0:
                            freq_nr_cache[svid] = int(sat["FreqNr"])

                case "MeasEpoch":
                    epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                    if epoch is not None:
                        yield epoch

    # ------------------------------------------------------------------
    # Dataset construction — observations
    # ------------------------------------------------------------------

    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        pad_global_sid: bool = True,
        strip_fillval: bool = True,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert SBF observations to an ``(epoch, sid)`` xarray Dataset.

        Produces the same structure as :class:`~canvod.readers.rinex.v3_04.Rnxv3Obs`
        and passes :func:`~canvod.readers.base.validate_dataset`.

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to retain.  If ``None``, all five variables are
            kept: ``SNR``, ``Pseudorange``, ``Phase``, ``Doppler``, ``SSI``.
            Note: ``LLI`` is not produced — SBF has no loss-of-lock indicator.
        pad_global_sid : bool, default True
            If ``True``, pads the dataset to the global SID space via
            :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.
        strip_fillval : bool, default True
            If ``True``, removes fill values via
            :func:`canvod.auxiliary.preprocessing.strip_fillvalue`.
        **kwargs
            Ignored (for ABC compatibility).

        Returns
        -------
        xr.Dataset
            Dataset with dimensions ``(epoch, sid)`` that passes
            :func:`~canvod.readers.base.validate_dataset`.
        """
        import math

        freq_nr_cache = self._freq_nr_cache.copy()

        # --- Single pass: collect timestamps, SID properties, and per-epoch obs ---
        # Stores per-epoch obs as dicts (SID → value) so we only scan the file once.
        # Array construction happens afterwards in fast in-memory loops.
        sid_props: dict[str, dict[str, Any]] = {}
        timestamps: list[np.datetime64] = []
        # Per-epoch accumulator: list of (snr_dict, pr_dict, ph_dict, dop_dict)
        epoch_rows: list[
            tuple[
                dict[str, float], dict[str, float], dict[str, float], dict[str, float]
            ]
        ] = []

        for epoch in self.iter_epochs():
            ts_np = np.datetime64(epoch.timestamp.replace(tzinfo=None), "ns")
            timestamps.append(ts_np)

            e_snr: dict[str, float] = {}
            e_pr: dict[str, float] = {}
            e_ph: dict[str, float] = {}
            e_dop: dict[str, float] = {}

            for obs in epoch.observations:
                props = _sid_props_from_obs(obs.svid, obs.signal_num, freq_nr_cache)
                if props is None:
                    continue
                sid = props["sid"]
                if sid not in sid_props:
                    sid_props[sid] = props
                if obs.cn0 is not None:
                    e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
                if obs.pseudorange is not None:
                    e_pr[sid] = float(obs.pseudorange.to(UREG.meter).magnitude)
                if obs.phase_cycles is not None:
                    e_ph[sid] = obs.phase_cycles
                if obs.doppler is not None:
                    e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)

            epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

        sorted_sids = sorted(sid_props)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(timestamps)
        n_sids = len(sorted_sids)

        # Allocate arrays (LLI is dropped — SBF has no loss-of-lock indicator)
        snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
            for sid, val in e_snr.items():
                snr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_pr.items():
                pr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_ph.items():
                ph_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_dop.items():
                dop_arr[t_idx, sid_to_idx[sid]] = val

        # Build coordinate arrays
        freq_center = np.asarray(
            [sid_props[s]["freq_center"] for s in sorted_sids],
            dtype=DTYPES["freq_center"],
        )
        freq_min = np.asarray(
            [sid_props[s]["freq_min"] for s in sorted_sids], dtype=DTYPES["freq_min"]
        )
        freq_max = np.asarray(
            [sid_props[s]["freq_max"] for s in sorted_sids], dtype=DTYPES["freq_max"]
        )

        coords: dict[str, Any] = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": (
                "sid",
                np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        attrs = cast(dict[str, Any], self._build_attrs())

        # Add ECEF position from ReceiverSetup header for pipeline compatibility.
        # ECEFPosition.from_ds_metadata() reads "APPROX POSITION X/Y/Z".
        try:
            import pymap3d as pm

            hdr = self.header
            lat_deg = math.degrees(hdr.latitude_rad)
            lon_deg = math.degrees(hdr.longitude_rad)
            h_m = float(hdr.height_m.to(UREG.meter).magnitude)
            x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
            attrs["APPROX POSITION X"] = float(x)
            attrs["APPROX POSITION Y"] = float(y)
            attrs["APPROX POSITION Z"] = float(z)
        except LookupError, AttributeError:
            pass  # SBF file without a ReceiverSetup block

        ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pr_arr,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
                "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
                "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
            },
            coords=coords,
            attrs=attrs,
        )

        # Post-process
        if keep_data_vars is not None:
            for var in list(ds.data_vars):
                if var not in keep_data_vars:
                    ds = ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            ds = pad_to_global_sid(
                ds,
                keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
            )

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            ds = strip_fillvalue(ds)

        validate_dataset(ds, required_vars=keep_data_vars)
        return ds

    # ------------------------------------------------------------------
    # Dataset construction — metadata
    # ------------------------------------------------------------------

    def to_metadata_ds(
        self, pad_global_sid: bool = True, **kwargs: object
    ) -> xr.Dataset:
        """Decode SBF metadata blocks to an ``(epoch, sid)`` xarray Dataset.

        Decodes PVTGeodetic, DOP, ReceiverStatus, SatVisibility, and
        MeasExtra blocks in a single file scan.

        Parameters
        ----------
        pad_global_sid : bool, default True
            If ``True``, pads to the global SID space via
            :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions ``(epoch, sid)``.  Epoch-level scalars
            (PDOP, NrSV, …) are 1-D ``(epoch,)`` coordinates.  Satellite
            geometry (theta, phi) and signal quality (MPCorrection, …) are
            ``(epoch, sid)`` data variables.
        """
        parser = sbf_parser.SbfParser()
        freq_nr_cache = self._freq_nr_cache.copy()

        pending: dict[str, Any] = {
            "pvt": None,
            "dop": None,
            "status": None,
            "satvis": [],
            "extra": [],
        }

        # Each record: (ts, pvt, dop, status, satvis, extra, obs_map)
        records: list[tuple[Any, ...]] = []

        # sid discovery — same logic as to_ds() pass 1
        sid_props: dict[str, dict[str, Any]] = {}

        delta_ls: int = _DEFAULT_DELTA_LS

        for name, data in parser.read(str(self.fpath)):
            match name:
                case "ReceiverTime":
                    delta_ls = int(data["DeltaLS"])

                case "ChannelStatus":
                    for sat in data.get("ChannelSatInfo", []):
                        svid_cs = int(sat["SVID"])
                        if svid_cs != 0:
                            freq_nr_cache[svid_cs] = int(sat["FreqNr"])

                case "PVTGeodetic":
                    pending["pvt"] = data

                case "DOP":
                    pending["dop"] = data

                case "ReceiverStatus":
                    pending["status"] = data

                case "SatVisibility":
                    pending["satvis"] = list(data.get("SatInfo", []))

                case "MeasExtra":
                    pending["extra"] = list(data.get("MeasExtraChannel", []))

                case "MeasEpoch":
                    tow_ms = int(data["TOW"])
                    wn = int(data["WNc"])
                    ts = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                    obs_map = _build_obs_map(data)

                    # Discover sids from Type1 and Type2 sub-blocks
                    for t1 in data.get("Type_1", []):
                        svid1 = int(t1["SVID"])
                        props1 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if props1 is not None and props1["sid"] not in sid_props:
                            sid_props[props1["sid"]] = props1

                        for t2 in t1.get("Type_2", []):
                            props2 = _sid_props_from_obs(
                                svid1,
                                decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                                freq_nr_cache,
                            )
                            if props2 is not None and props2["sid"] not in sid_props:
                                sid_props[props2["sid"]] = props2

                    records.append(
                        (
                            ts,
                            pending["pvt"],
                            pending["dop"],
                            pending["status"],
                            list(pending["satvis"]),
                            list(pending["extra"]),
                            obs_map,
                        )
                    )
                    pending = {
                        "pvt": None,
                        "dop": None,
                        "status": None,
                        "satvis": [],
                        "extra": [],
                    }

        # Build index structures
        sorted_sids = sorted(sid_props)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(records)
        n_sids = len(sorted_sids)

        # sv → list of sid indices (for SatVisibility broadcasting)
        sids_for_sv: dict[str, list[int]] = {}
        for sid in sorted_sids:
            sv = sid_props[sid]["sv"]
            sids_for_sv.setdefault(sv, []).append(sid_to_idx[sid])

        # (epoch, sid) data variable arrays
        theta_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        phi_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        rise_set_arr = np.full((n_epochs, n_sids), -1, dtype=np.int8)
        mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        smoothing_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        code_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        carr_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        lock_time_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        cum_loss_cont_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        car_mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
        cn0_highres_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)

        # (epoch,) scalar coordinate arrays
        pdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        hdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        vdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        n_sv_arr = np.full(n_epochs, -1, dtype=np.int16)
        h_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        v_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        pvt_mode_arr = np.full(n_epochs, -1, dtype=np.int8)
        mean_corr_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        cpu_load_arr = np.full(n_epochs, -1, dtype=np.int8)
        temp_arr = np.full(n_epochs, np.nan, dtype=np.float32)
        rx_error_arr = np.full(n_epochs, 0, dtype=np.int32)

        timestamps: list[np.datetime64] = []

        # Fill arrays from records
        for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
            timestamps.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

            # DOP block → pdop, hdop, vdop
            if dop is not None:
                try:
                    pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            # PVTGeodetic → n_sv, accuracy, mode, correction age
            if pvt is not None:
                try:
                    n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                    raw_hacc = int(pvt["HAccuracy"])
                    if raw_hacc != 65535:
                        h_acc_arr[t_idx] = raw_hacc * 0.01
                    raw_vacc = int(pvt["VAccuracy"])
                    if raw_vacc != 65535:
                        v_acc_arr[t_idx] = raw_vacc * 0.01
                    pvt_mode_arr[t_idx] = int(pvt["Mode"])
                    mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                    # Also pick up DOP from PVTGeodetic if DOP block absent
                    if np.isnan(pdop_arr[t_idx]):
                        pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                        hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                        vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            # ReceiverStatus → cpu_load, temperature, rx_error
            if status is not None:
                try:
                    cpu_load_arr[t_idx] = int(status["CPULoad"])
                    raw_temp = int(status["Temperature"])
                    if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                        temp_arr[t_idx] = float(raw_temp - 100)
                    rx_error_arr[t_idx] = int(status["RxError"])
                except KeyError, TypeError, ValueError:
                    pass

            # SatVisibility → broadcast theta/phi to all sids for that sv
            for sat_info in satvis:
                try:
                    svid_raw = int(sat_info["SVID"])
                    sys_code, prn = decode_svid(svid_raw)
                    sv = f"{sys_code}{prn:02d}"
                    theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                    phi_deg = int(sat_info["Azimuth"]) * 0.01
                    rs = int(sat_info["RiseSet"])
                    for s_idx in sids_for_sv.get(sv, []):
                        theta_arr[t_idx, s_idx] = theta_deg
                        phi_arr[t_idx, s_idx] = phi_deg
                        rise_set_arr[t_idx, s_idx] = rs
                except KeyError, TypeError, ValueError:
                    pass

            # MeasExtra → per-(epoch, sid) signal quality
            for ch in extra:
                try:
                    type_byte = int(ch["Type"])
                    info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                    sig_num = decode_signal_num(type_byte, info_byte)
                    rx_ch = int(ch["RxChannel"])
                    svid = obs_map.get((rx_ch, sig_num))
                    if svid is None:
                        continue
                    sig_def = SIGNAL_TABLE.get(sig_num)
                    if sig_def is None:
                        continue
                    sys_code2, prn2 = decode_svid(svid)
                    sv2 = f"{sys_code2}{prn2:02d}"
                    sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                    s_idx = sid_to_idx.get(sid)
                    if s_idx is None:
                        continue
                    mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                    mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                    # SmoothingCorr: i2, scale 0.001 m/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    raw_sc = ch.get("SmoothingCorr")
                    if raw_sc is not None:
                        smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                    raw_cv = ch.get("CodeVar")
                    # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cv is not None and int(raw_cv) != 65535:
                        code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                    raw_rv = ch.get("CarrierVar")
                    # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_rv is not None and int(raw_rv) != 65535:
                        carr_var_arr[t_idx, s_idx] = float(raw_rv)
                    raw_lt = ch.get("LockTime")
                    # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_lt is not None and int(raw_lt) != 65535:
                        lock_time_arr[t_idx, s_idx] = float(raw_lt)
                    raw_clc = ch.get("CumLossCont")
                    # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_clc is not None:
                        cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                    raw_cmc = ch.get("CarMPCorr")
                    # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cmc is not None:
                        car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                    raw_misc = ch.get("Misc")
                    # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_misc is not None:
                        cn0_hr = int(raw_misc) & 0x07
                        cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
                except KeyError, TypeError, ValueError:
                    pass

        # Build Dataset
        freq_center = np.asarray(
            [sid_props[s]["freq_center"] for s in sorted_sids], dtype=np.float32
        )
        freq_min = np.asarray(
            [sid_props[s]["freq_min"] for s in sorted_sids], dtype=np.float32
        )
        freq_max = np.asarray(
            [sid_props[s]["freq_max"] for s in sorted_sids], dtype=np.float32
        )

        coords: dict[str, Any] = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": (
                "sid",
                np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
            # Epoch-level scalars (1-D over epoch)
            "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
            "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
            "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
            "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
            "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
            "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
            "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
            "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
            "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
            "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
            "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
        }

        attrs = self._build_attrs()

        ds = xr.Dataset(
            data_vars={
                "broadcast_theta": (
                    ["epoch", "sid"],
                    np.deg2rad(theta_arr),
                    _BROADCAST_THETA_ATTRS,
                ),
                "broadcast_phi": (
                    ["epoch", "sid"],
                    np.deg2rad(phi_arr),
                    _BROADCAST_PHI_ATTRS,
                ),
                "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
                "mp_correction_m": (
                    ["epoch", "sid"],
                    mp_corr_arr,
                    _MP_CORRECTION_ATTRS,
                ),
                "smoothing_corr_m": (
                    ["epoch", "sid"],
                    smoothing_corr_arr,
                    _SMOOTHING_CORR_ATTRS,
                ),
                "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
                "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
                "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
                "cum_loss_cont": (
                    ["epoch", "sid"],
                    cum_loss_cont_arr,
                    _CUM_LOSS_CONT_ATTRS,
                ),
                "car_mp_corr_cycles": (
                    ["epoch", "sid"],
                    car_mp_corr_arr,
                    _CAR_MP_CORR_ATTRS,
                ),
                "cn0_highres_correction": (
                    ["epoch", "sid"],
                    cn0_highres_arr,
                    _CN0_HIGHRES_CORRECTION_ATTRS,
                ),
            },
            coords=coords,
            attrs=attrs,
        )

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            ds = pad_to_global_sid(
                ds,
                keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
            )

        return ds

    # ------------------------------------------------------------------
    # Combined single-pass: observations + auxiliary metadata
    # ------------------------------------------------------------------

    def to_ds_and_auxiliary(
        self,
        keep_data_vars: list[str] | None = None,
        pad_global_sid: bool = True,
        strip_fillval: bool = True,
        store_raw_observables: bool = True,
        **kwargs: object,
    ) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
        """Single file scan producing both the obs dataset and the SBF metadata dataset.

        Performs ONE ``parser.read()`` pass, collecting MeasEpoch observations
        and PVTGeodetic/DOP/SatVisibility/MeasExtra metadata blocks simultaneously.
        ``to_ds()`` and ``to_metadata_ds()`` remain unchanged for standalone use.

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to retain in the obs dataset.
        pad_global_sid : bool, default True
            Pad obs dataset to the global SID space.
        strip_fillval : bool, default True
            Strip fill values from the obs dataset.
        store_raw_observables : bool, default True
            Add pre-correction "raw" observable variables to the obs dataset:
            ``SNR_raw``, ``Pseudorange_unsmoothed``, ``Pseudorange_raw``,
            ``Phase_raw``.  Set to ``False`` to reduce dataset size when these
            are not needed.
        **kwargs
            Forwarded to ``pad_to_global_sid`` (e.g. ``keep_sids``).

        Returns
        -------
        tuple[xr.Dataset, dict[str, xr.Dataset]]
            ``(obs_ds, {"sbf_obs": meta_ds})``.
        """
        import math

        parser = sbf_parser.SbfParser()
        freq_nr_cache = self._freq_nr_cache.copy()
        delta_ls: int = _DEFAULT_DELTA_LS

        # Separate sid discovery for obs (matches to_ds) and metadata (matches to_metadata_ds)
        sid_props_obs: dict[str, dict[str, Any]] = {}
        sid_props_meta: dict[str, dict[str, Any]] = {}

        # Obs-side accumulators (same as to_ds)
        timestamps_obs: list[np.datetime64] = []
        epoch_rows: list[
            tuple[
                dict[str, float], dict[str, float], dict[str, float], dict[str, float]
            ]
        ] = []

        # Metadata-side accumulators (same as to_metadata_ds)
        pending: dict[str, Any] = {
            "pvt": None,
            "dop": None,
            "status": None,
            "satvis": [],
            "extra": [],
        }
        records: list[tuple[Any, ...]] = []

        for name, data in parser.read(str(self.fpath)):
            match name:
                case "ReceiverTime":
                    delta_ls = int(data["DeltaLS"])

                case "ChannelStatus":
                    for sat in data.get("ChannelSatInfo", []):
                        svid = int(sat["SVID"])
                        if svid != 0:
                            freq_nr_cache[svid] = int(sat["FreqNr"])

                case "PVTGeodetic":
                    pending["pvt"] = data

                case "DOP":
                    pending["dop"] = data

                case "ReceiverStatus":
                    pending["status"] = data

                case "SatVisibility":
                    pending["satvis"] = list(data.get("SatInfo", []))

                case "MeasExtra":
                    pending["extra"] = list(data.get("MeasExtraChannel", []))

                case "MeasEpoch":
                    # --- Obs side ---
                    epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                    if epoch is not None:
                        ts_np = np.datetime64(
                            epoch.timestamp.replace(tzinfo=None), "ns"
                        )
                        timestamps_obs.append(ts_np)
                        e_snr: dict[str, float] = {}
                        e_pr: dict[str, float] = {}
                        e_ph: dict[str, float] = {}
                        e_dop: dict[str, float] = {}
                        for obs in epoch.observations:
                            props = _sid_props_from_obs(
                                obs.svid, obs.signal_num, freq_nr_cache
                            )
                            if props is None:
                                continue
                            sid = props["sid"]
                            if sid not in sid_props_obs:
                                sid_props_obs[sid] = props
                            if obs.cn0 is not None:
                                e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
                            if obs.pseudorange is not None:
                                e_pr[sid] = float(
                                    obs.pseudorange.to(UREG.meter).magnitude
                                )
                            if obs.phase_cycles is not None:
                                e_ph[sid] = obs.phase_cycles
                            if obs.doppler is not None:
                                e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)
                        epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

                    # --- Metadata side (always, even if epoch decoded as None) ---
                    tow_ms = int(data["TOW"])
                    wn = int(data["WNc"])
                    ts_meta = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                    obs_map = _build_obs_map(data)

                    # Discover sids from Type1/Type2 sub-blocks (same as to_metadata_ds)
                    for t1 in data.get("Type_1", []):
                        svid1 = int(t1["SVID"])
                        props1 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if props1 is not None and props1["sid"] not in sid_props_meta:
                            sid_props_meta[props1["sid"]] = props1
                        for t2 in t1.get("Type_2", []):
                            props2 = _sid_props_from_obs(
                                svid1,
                                decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                                freq_nr_cache,
                            )
                            if (
                                props2 is not None
                                and props2["sid"] not in sid_props_meta
                            ):
                                sid_props_meta[props2["sid"]] = props2

                    records.append(
                        (
                            ts_meta,
                            pending["pvt"],
                            pending["dop"],
                            pending["status"],
                            list(pending["satvis"]),
                            list(pending["extra"]),
                            obs_map,
                        )
                    )
                    pending = {
                        "pvt": None,
                        "dop": None,
                        "status": None,
                        "satvis": [],
                        "extra": [],
                    }

        # ----------------------------------------------------------------
        # Build obs dataset (verbatim from to_ds())
        # ----------------------------------------------------------------
        sorted_sids = sorted(sid_props_obs)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(timestamps_obs)
        n_sids = len(sorted_sids)

        snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
            for sid, val in e_snr.items():
                snr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_pr.items():
                pr_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_ph.items():
                ph_arr[t_idx, sid_to_idx[sid]] = val
            for sid, val in e_dop.items():
                dop_arr[t_idx, sid_to_idx[sid]] = val

        freq_center = np.asarray(
            [sid_props_obs[s]["freq_center"] for s in sorted_sids],
            dtype=DTYPES["freq_center"],
        )
        freq_min = np.asarray(
            [sid_props_obs[s]["freq_min"] for s in sorted_sids],
            dtype=DTYPES["freq_min"],
        )
        freq_max = np.asarray(
            [sid_props_obs[s]["freq_max"] for s in sorted_sids],
            dtype=DTYPES["freq_max"],
        )

        coords_obs: dict[str, Any] = {
            "epoch": ("epoch", timestamps_obs, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": (
                "sid",
                np.array([sid_props_obs[s]["sv"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                np.array(
                    [sid_props_obs[s]["system"] for s in sorted_sids], dtype=object
                ),
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                np.array([sid_props_obs[s]["band"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                np.array([sid_props_obs[s]["code"] for s in sorted_sids], dtype=object),
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        attrs = cast(dict[str, Any], self._build_attrs())

        try:
            import pymap3d as pm

            hdr = self.header
            lat_deg = math.degrees(hdr.latitude_rad)
            lon_deg = math.degrees(hdr.longitude_rad)
            h_m = float(hdr.height_m.to(UREG.meter).magnitude)
            x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
            attrs["APPROX POSITION X"] = float(x)
            attrs["APPROX POSITION Y"] = float(y)
            attrs["APPROX POSITION Z"] = float(z)
        except LookupError, AttributeError:
            pass

        obs_ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pr_arr,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
                "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
                "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
            },
            coords=coords_obs,
            attrs=attrs,
        )

        if keep_data_vars is not None:
            for var in list(obs_ds.data_vars):
                if var not in keep_data_vars:
                    obs_ds = obs_ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            obs_ds = pad_to_global_sid(
                obs_ds,
                keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
            )

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            obs_ds = strip_fillvalue(obs_ds)

        validate_dataset(obs_ds, required_vars=keep_data_vars)

        # ----------------------------------------------------------------
        # Build metadata dataset (verbatim from to_metadata_ds())
        # ----------------------------------------------------------------
        sorted_sids_meta = sorted(sid_props_meta)
        sid_to_idx_meta = {sid: i for i, sid in enumerate(sorted_sids_meta)}
        n_epochs_meta = len(records)
        n_sids_meta = len(sorted_sids_meta)

        sids_for_sv: dict[str, list[int]] = {}
        for sid in sorted_sids_meta:
            sv = sid_props_meta[sid]["sv"]
            sids_for_sv.setdefault(sv, []).append(sid_to_idx_meta[sid])

        theta_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        phi_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        rise_set_arr = np.full((n_epochs_meta, n_sids_meta), -1, dtype=np.int8)
        mp_corr_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        smoothing_corr_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )
        code_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        carr_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        lock_time_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
        cum_loss_cont_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )
        car_mp_corr_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )
        cn0_highres_arr = np.full(
            (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
        )

        pdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        hdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        vdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        n_sv_arr = np.full(n_epochs_meta, -1, dtype=np.int16)
        h_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        v_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        pvt_mode_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
        mean_corr_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        cpu_load_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
        temp_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
        rx_error_arr = np.full(n_epochs_meta, 0, dtype=np.int32)

        timestamps_meta: list[np.datetime64] = []

        for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
            timestamps_meta.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

            if dop is not None:
                try:
                    pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            if pvt is not None:
                try:
                    n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                    raw_hacc = int(pvt["HAccuracy"])
                    if raw_hacc != 65535:
                        h_acc_arr[t_idx] = raw_hacc * 0.01
                    raw_vacc = int(pvt["VAccuracy"])
                    if raw_vacc != 65535:
                        v_acc_arr[t_idx] = raw_vacc * 0.01
                    pvt_mode_arr[t_idx] = int(pvt["Mode"])
                    mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                    if np.isnan(pdop_arr[t_idx]):
                        pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                        hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                        vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
                except KeyError, TypeError, ValueError:
                    pass

            if status is not None:
                try:
                    cpu_load_arr[t_idx] = int(status["CPULoad"])
                    raw_temp = int(status["Temperature"])
                    if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                        temp_arr[t_idx] = float(raw_temp - 100)
                    rx_error_arr[t_idx] = int(status["RxError"])
                except KeyError, TypeError, ValueError:
                    pass

            for sat_info in satvis:
                try:
                    svid_raw = int(sat_info["SVID"])
                    sys_code, prn = decode_svid(svid_raw)
                    sv = f"{sys_code}{prn:02d}"
                    theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                    phi_deg = int(sat_info["Azimuth"]) * 0.01
                    rs = int(sat_info["RiseSet"])
                    for s_idx in sids_for_sv.get(sv, []):
                        theta_arr[t_idx, s_idx] = theta_deg
                        phi_arr[t_idx, s_idx] = phi_deg
                        rise_set_arr[t_idx, s_idx] = rs
                except KeyError, TypeError, ValueError:
                    pass

            for ch in extra:
                try:
                    type_byte = int(ch["Type"])
                    info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                    sig_num = decode_signal_num(type_byte, info_byte)
                    rx_ch = int(ch["RxChannel"])
                    svid = obs_map.get((rx_ch, sig_num))
                    if svid is None:
                        continue
                    sig_def = SIGNAL_TABLE.get(sig_num)
                    if sig_def is None:
                        continue
                    sys_code2, prn2 = decode_svid(svid)
                    sv2 = f"{sys_code2}{prn2:02d}"
                    sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                    s_idx = sid_to_idx_meta.get(sid)
                    if s_idx is None:
                        continue
                    mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                    mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                    # SmoothingCorr: i2, scale 0.001 m/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    raw_sc = ch.get("SmoothingCorr")
                    if raw_sc is not None:
                        smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                    raw_cv = ch.get("CodeVar")
                    # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cv is not None and int(raw_cv) != 65535:
                        code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                    raw_rv = ch.get("CarrierVar")
                    # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_rv is not None and int(raw_rv) != 65535:
                        carr_var_arr[t_idx, s_idx] = float(raw_rv)
                    raw_lt = ch.get("LockTime")
                    # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_lt is not None and int(raw_lt) != 65535:
                        lock_time_arr[t_idx, s_idx] = float(raw_lt)
                    raw_clc = ch.get("CumLossCont")
                    # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_clc is not None:
                        cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                    raw_cmc = ch.get("CarMPCorr")
                    # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_cmc is not None:
                        car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                    raw_misc = ch.get("Misc")
                    # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                    # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                    if raw_misc is not None:
                        cn0_hr = int(raw_misc) & 0x07
                        cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
                except KeyError, TypeError, ValueError:
                    pass

        freq_center_meta = np.asarray(
            [sid_props_meta[s]["freq_center"] for s in sorted_sids_meta],
            dtype=np.float32,
        )
        freq_min_meta = np.asarray(
            [sid_props_meta[s]["freq_min"] for s in sorted_sids_meta], dtype=np.float32
        )
        freq_max_meta = np.asarray(
            [sid_props_meta[s]["freq_max"] for s in sorted_sids_meta], dtype=np.float32
        )

        coords_meta: dict[str, Any] = {
            "epoch": ("epoch", timestamps_meta, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                sorted_sids_meta, dims=["sid"], attrs=COORDS_METADATA["sid"]
            ),
            "sv": (
                "sid",
                [sid_props_meta[s]["sv"] for s in sorted_sids_meta],
                COORDS_METADATA["sv"],
            ),
            "system": (
                "sid",
                [sid_props_meta[s]["system"] for s in sorted_sids_meta],
                COORDS_METADATA["system"],
            ),
            "band": (
                "sid",
                [sid_props_meta[s]["band"] for s in sorted_sids_meta],
                COORDS_METADATA["band"],
            ),
            "code": (
                "sid",
                [sid_props_meta[s]["code"] for s in sorted_sids_meta],
                COORDS_METADATA["code"],
            ),
            "freq_center": ("sid", freq_center_meta, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min_meta, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max_meta, COORDS_METADATA["freq_max"]),
            "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
            "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
            "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
            "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
            "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
            "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
            "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
            "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
            "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
            "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
            "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
        }

        attrs_meta = self._build_attrs()

        meta_ds = xr.Dataset(
            data_vars={
                "broadcast_theta": (
                    ["epoch", "sid"],
                    np.deg2rad(theta_arr),
                    _BROADCAST_THETA_ATTRS,
                ),
                "broadcast_phi": (
                    ["epoch", "sid"],
                    np.deg2rad(phi_arr),
                    _BROADCAST_PHI_ATTRS,
                ),
                "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
                "mp_correction_m": (
                    ["epoch", "sid"],
                    mp_corr_arr,
                    _MP_CORRECTION_ATTRS,
                ),
                "smoothing_corr_m": (
                    ["epoch", "sid"],
                    smoothing_corr_arr,
                    _SMOOTHING_CORR_ATTRS,
                ),
                "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
                "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
                "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
                "cum_loss_cont": (
                    ["epoch", "sid"],
                    cum_loss_cont_arr,
                    _CUM_LOSS_CONT_ATTRS,
                ),
                "car_mp_corr_cycles": (
                    ["epoch", "sid"],
                    car_mp_corr_arr,
                    _CAR_MP_CORR_ATTRS,
                ),
                "cn0_highres_correction": (
                    ["epoch", "sid"],
                    cn0_highres_arr,
                    _CN0_HIGHRES_CORRECTION_ATTRS,
                ),
            },
            coords=coords_meta,
            attrs=attrs_meta,
        )

        # Align meta_ds SID to obs_ds SID.
        # obs uses sid_props_obs (MeasEpoch); meta uses sid_props_meta (Type1/Type2)
        # — they can diverge.  Reindex fills missing SIDs with NaN.
        meta_ds = meta_ds.reindex(sid=obs_ds.sid, fill_value=np.nan)
        # rise_set is int8 with sentinel -1; NaN fill promotes to float — cast back.
        if meta_ds["rise_set"].dtype != np.int8:
            meta_ds["rise_set"] = meta_ds["rise_set"].fillna(-1).astype(np.int8)

        # Apply CN0HighRes correction from MeasExtra (Block 4000) to SNR.
        # CN0HighRes extends resolution from 0.25 to 0.03125 dB-Hz.
        # RefGuide-4.14.0, MeasExtra MeasExtraChannelSub.Misc bits 0-2, p.265.
        # Where MeasExtra was not logged the correction array is NaN → no-op.
        corr = meta_ds["cn0_highres_correction"].values  # (epoch, sid), NaN if absent
        snr_raw_values = obs_ds["SNR"].values.copy()  # preserve 0.25 dB-Hz original
        snr_corrected = snr_raw_values.copy()
        valid = ~np.isnan(snr_corrected) & ~np.isnan(corr)
        snr_corrected[valid] += corr[valid]
        snr_attrs = dict(obs_ds["SNR"].attrs)
        snr_attrs["comment"] = (
            snr_attrs.get("comment", "")
            + " CN0HighRes correction from MeasExtra (Block 4000, p.265) applied where"
            " available, improving resolution from 0.25 to 0.03125 dB-Hz."
        ).lstrip()
        obs_ds["SNR"] = xr.DataArray(
            snr_corrected,
            dims=["epoch", "sid"],
            coords=obs_ds["SNR"].coords,
            attrs=snr_attrs,
        )

        if store_raw_observables:
            # ------------------------------------------------------------------
            # Add "physically raw" observables: pre-correction versions of SNR,
            # pseudorange, and carrier phase.  NaN where MeasExtra was absent.
            # Gated by store_raw_observables (config: store_sbf_raw_observables).
            # ------------------------------------------------------------------

            # SNR_raw: 0.25 dB-Hz resolution, before CN0HighRes extension.
            obs_ds["SNR_raw"] = xr.DataArray(
                snr_raw_values,
                dims=["epoch", "sid"],
                coords=obs_ds["SNR"].coords,
                attrs=_SNR_RAW_ATTRS,
            )

            # Pseudorange_unsmoothed: Hatch-filter correction removed.
            smooth = meta_ds["smoothing_corr_m"].values
            pr_vals = obs_ds["Pseudorange"].values
            pr_unsmoothed = np.where(
                ~np.isnan(smooth), pr_vals + smooth, np.nan
            ).astype(np.float64)
            obs_ds["Pseudorange_unsmoothed"] = xr.DataArray(
                pr_unsmoothed,
                dims=["epoch", "sid"],
                coords=obs_ds["Pseudorange"].coords,
                attrs=_PSEUDORANGE_UNSMOOTHED_ATTRS,
            )

            # Pseudorange_raw: both Hatch-filter and multipath corrections removed.
            mp = meta_ds["mp_correction_m"].values
            available = ~np.isnan(smooth) & ~np.isnan(mp)
            pr_raw = np.where(available, pr_vals + smooth + mp, np.nan).astype(
                np.float64
            )
            obs_ds["Pseudorange_raw"] = xr.DataArray(
                pr_raw,
                dims=["epoch", "sid"],
                coords=obs_ds["Pseudorange"].coords,
                attrs=_PSEUDORANGE_RAW_ATTRS,
            )

            # Phase_raw: carrier multipath correction removed.
            car_mp = meta_ds["car_mp_corr_cycles"].values
            ph_vals = obs_ds["Phase"].values
            ph_raw = np.where(~np.isnan(car_mp), ph_vals + car_mp, np.nan).astype(
                np.float64
            )
            obs_ds["Phase_raw"] = xr.DataArray(
                ph_raw,
                dims=["epoch", "sid"],
                coords=obs_ds["Phase"].coords,
                attrs=_PHASE_RAW_ATTRS,
            )

        return obs_ds, {"sbf_obs": meta_ds}

    # ------------------------------------------------------------------
    # Private decoding helpers
    # ------------------------------------------------------------------

    def _decode_epoch(  # pylint: disable=too-many-locals
        self,
        data: dict[str, Any],
        freq_nr_cache: dict[int, int],
        delta_ls: int,
    ) -> SbfEpoch | None:
        """Decode one raw MeasEpoch dict into an :class:`SbfEpoch`.

        Parameters
        ----------
        data : dict
            Raw block dict from ``sbf_parser``.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr mapping for GLONASS FDMA frequency lookup.
        delta_ls : int
            GPS - UTC leap second offset.

        Returns
        -------
        SbfEpoch or None
            Decoded epoch, or ``None`` if decoding fails (logged as warning).
        """
        tow_ms = int(data["TOW"])
        wn = int(data["WNc"])
        timestamp = _tow_wn_to_utc(tow_ms, wn, delta_ls)
        common_flags = int(data["CommonFlags"])
        cum_clk_jumps = int(data["CumClkJumps"])

        observations: list[SbfSignalObs] = []

        for t1 in data.get("Type_1", []):
            t1_obs, t1_freq = self._decode_type1(t1, freq_nr_cache)
            if t1_obs is not None:
                observations.append(t1_obs)
                # Decode linked Type2 slave observations
                pr1 = t1_obs.pseudorange
                d1 = t1_obs.doppler
                if pr1 is not None and d1 is not None and t1_freq is not None:
                    for t2 in t1.get("Type_2", []):
                        t2_obs = self._decode_type2(
                            t2, int(t1["SVID"]), pr1, d1, t1_freq, freq_nr_cache
                        )
                        if t2_obs is not None:
                            observations.append(t2_obs)

        return SbfEpoch(
            tow_ms=tow_ms,
            wn=wn,
            timestamp=timestamp,
            common_flags=common_flags,
            cum_clk_jumps=cum_clk_jumps,
            observations=tuple(observations),
        )

    def _resolve_freq(
        self,
        sig_num: int,
        svid: int,
        freq_nr_cache: dict[int, int],
    ) -> pint.Quantity | None:
        """Return carrier frequency as a pint Quantity, or None if unavailable.

        Parameters
        ----------
        sig_num : int
            Signal type number (0-39).
        svid : int
            Septentrio internal SVID.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr map.

        Returns
        -------
        pint.Quantity or None
            Carrier frequency (in MHz), or ``None`` if GLONASS and FreqNr
            not yet known, or signal not in table (e.g. L-Band MSS).
        """
        if sig_num in FDMA_SIGNAL_NUMS:
            freq_nr = freq_nr_cache.get(svid)
            if freq_nr is None:
                return None
            return glonass_freq_hz(sig_num, freq_nr)

        sig_def = SIGNAL_TABLE.get(sig_num)
        if sig_def is None:
            return None
        return sig_def.freq  # None for L-Band MSS (sig 23)

    def _decode_type1(  # pylint: disable=too-many-locals
        self,
        t1: dict[str, Any],
        freq_nr_cache: dict[int, int],
    ) -> tuple[SbfSignalObs | None, pint.Quantity | None]:
        """Decode a Type1 sub-block dict to an SbfSignalObs.

        Parameters
        ----------
        t1 : dict
            Raw Type1 sub-block dict.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr map.

        Returns
        -------
        obs : SbfSignalObs or None
            Decoded observation, or ``None`` for unknown signals.
        freq : pint.Quantity or None
            Carrier frequency used (needed for Type2 Doppler scaling).
        """
        svid = int(t1["SVID"])
        type_byte = int(t1["Type"])
        obs_info = int(t1["ObsInfo"])
        sig_num = decode_signal_num(type_byte, obs_info)

        sig_def = SIGNAL_TABLE.get(sig_num)
        if sig_def is None:
            log.debug("sbf_unknown_signal", svid=svid, sig_num=sig_num)
            return None, None

        system, prn = decode_svid(svid)
        freq = self._resolve_freq(sig_num, svid, freq_nr_cache)

        misc = int(t1["Misc"])
        code_lsb = int(t1["CodeLSB"])
        pr = pseudorange_m(misc, code_lsb)
        dop = doppler_hz(int(t1["Doppler"]))
        carrier_msb = int(t1["CarrierMSB"])
        carrier_lsb = int(t1["CarrierLSB"])

        ph: float | None = None
        if pr is not None and freq is not None:
            ph = phase_cycles(pr, carrier_msb, carrier_lsb, freq)

        obs = SbfSignalObs(
            svid=svid,
            system=system,
            prn=prn,
            signal_num=sig_num,
            signal_type=sig_def.signal_type,
            rx_channel=int(t1["RxChannel"]),
            lock_time_ms=int(t1["LockTime"]),
            cn0=cn0_dbhz(int(t1["CN0"]), sig_num),
            pseudorange=pr,
            doppler=dop,
            phase_cycles=ph,
            obs_info=obs_info,
            is_type2=False,
        )
        return obs, freq

    def _decode_type2(  # pylint: disable=too-many-arguments,too-many-locals,too-many-positional-arguments
        self,
        t2: dict[str, Any],
        svid: int,
        pr1: pint.Quantity,
        d1: pint.Quantity,
        freq1: pint.Quantity,
        freq_nr_cache: dict[int, int],
    ) -> SbfSignalObs | None:
        """Decode a Type2 sub-block dict to an SbfSignalObs.

        Parameters
        ----------
        t2 : dict
            Raw Type2 sub-block dict.
        svid : int
            SVID of the parent Type1 sub-block.
        pr1 : pint.Quantity
            Type1 pseudorange in metres.
        d1 : pint.Quantity
            Type1 Doppler in Hz.
        freq1 : pint.Quantity
            Type1 carrier frequency.
        freq_nr_cache : dict of {int: int}
            Current SVID → FreqNr map.

        Returns
        -------
        SbfSignalObs or None
            Decoded observation, or ``None`` for unknown signals.
        """
        type_byte = int(t2["Type"])
        obs_info = int(t2["ObsInfo"])
        sig_num = decode_signal_num(type_byte, obs_info)

        sig_def = SIGNAL_TABLE.get(sig_num)
        if sig_def is None:
            log.debug("sbf_unknown_type2_signal", svid=svid, sig_num=sig_num)
            return None

        system, prn = decode_svid(svid)
        freq2 = self._resolve_freq(sig_num, svid, freq_nr_cache)

        code_msb_signed, doppler_msb_signed = decode_offsets_msb(int(t2["OffsetMSB"]))
        code_offset_lsb = int(t2["CodeOffsetLSB"])
        doppler_offset_lsb = int(t2["DopplerOffsetLSB"])
        carrier_msb = int(t2["CarrierMSB"])
        carrier_lsb = int(t2["CarrierLSB"])

        pr2 = pr2_m(pr1, code_msb_signed, code_offset_lsb)

        d2: pint.Quantity | None = None
        if freq2 is not None:
            d2 = doppler2_hz(d1, doppler_msb_signed, doppler_offset_lsb, freq2, freq1)

        ph: float | None = None
        if pr2 is not None and freq2 is not None:
            ph = phase_cycles(pr2, carrier_msb, carrier_lsb, freq2)

        return SbfSignalObs(
            svid=svid,
            system=system,
            prn=prn,
            signal_num=sig_num,
            signal_type=sig_def.signal_type,
            rx_channel=int(t2.get("RxChannel", 0)),
            lock_time_ms=int(t2["LockTime"]),
            cn0=cn0_dbhz(int(t2["CN0"]), sig_num),
            pseudorange=pr2,
            doppler=d2,
            phase_cycles=ph,
            obs_info=obs_info,
            is_type2=True,
        )

    def __repr__(self) -> str:
        """Return a short string representation."""
        return f"SbfReader(file='{self.fpath.name}', epochs={self.num_epochs})"

file_hash cached property

SHA-256 hex digest of the file (first 16 characters).

Returns

str 16-character hexadecimal prefix of the SHA-256 hash.

start_time cached property

Return the timestamp of the first decoded epoch.

Returns

datetime Timezone-aware UTC datetime of the first observation epoch.

Raises

LookupError If the file contains no decodable epochs.

end_time cached property

Return the timestamp of the last decoded epoch.

Returns

datetime Timezone-aware UTC datetime of the last observation epoch.

Raises

LookupError If the file contains no decodable epochs.

systems cached property

Return sorted list of GNSS system codes present in the file.

Returns

list of str Sorted list of RINEX system letters (e.g. ["E", "G", "R"]).

num_satellites cached property

Return the number of unique satellites observed in the file.

Returns

int Count of unique system + PRN pairs across all epochs.

num_epochs cached property

Count the number of MeasEpoch blocks in the file.

Returns

int Total MeasEpoch block count (one per observation epoch).

Notes

Scans the entire file once; result is cached.

header cached property

Parse the first ReceiverSetup block in the file.

Returns

SbfHeader Receiver metadata.

Raises

LookupError If no ReceiverSetup block is found.

iter_epochs()

Iterate over decoded MeasEpoch blocks.

Yields decoded :class:SbfEpoch objects with all signal observations converted to physical units as :class:pint.Quantity.

Yields

SbfEpoch One decoded observation epoch.

Notes
  • The file is scanned from start to finish on each call.
  • The :attr:_freq_nr_cache is pre-populated from ALL ChannelStatus blocks before the first call, so all GLONASS FDMA epochs have accurate carrier frequencies.
  • delta_ls (leap seconds) is taken from the most recent ReceiverTime block; defaults to 18 if none has been seen yet.
Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
def iter_epochs(self) -> Iterator[SbfEpoch]:
    """Iterate over decoded MeasEpoch blocks.

    Yields decoded :class:`SbfEpoch` objects with all signal observations
    converted to physical units as :class:`pint.Quantity`.

    Yields
    ------
    SbfEpoch
        One decoded observation epoch.

    Notes
    -----
    - The file is scanned from start to finish on each call.
    - The :attr:`_freq_nr_cache` is pre-populated from ALL ChannelStatus
      blocks before the first call, so all GLONASS FDMA epochs have
      accurate carrier frequencies.
    - ``delta_ls`` (leap seconds) is taken from the most recent
      ReceiverTime block; defaults to 18 if none has been seen yet.
    """
    parser = sbf_parser.SbfParser()
    freq_nr_cache: dict[int, int] = self._freq_nr_cache.copy()
    delta_ls: int = _DEFAULT_DELTA_LS

    for name, data in parser.read(str(self.fpath)):
        match name:
            case "ReceiverTime":
                delta_ls = int(data["DeltaLS"])

            case "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid = int(sat["SVID"])
                    if svid != 0:
                        freq_nr_cache[svid] = int(sat["FreqNr"])

            case "MeasEpoch":
                epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                if epoch is not None:
                    yield epoch

to_ds(keep_data_vars=None, pad_global_sid=True, strip_fillval=True, **kwargs)

Convert SBF observations to an (epoch, sid) xarray Dataset.

Produces the same structure as :class:~canvod.readers.rinex.v3_04.Rnxv3Obs and passes :func:~canvod.readers.base.validate_dataset.

Parameters

keep_data_vars : list of str, optional Data variables to retain. If None, all five variables are kept: SNR, Pseudorange, Phase, Doppler, SSI. Note: LLI is not produced — SBF has no loss-of-lock indicator. pad_global_sid : bool, default True If True, pads the dataset to the global SID space via :func:canvod.auxiliary.preprocessing.pad_to_global_sid. strip_fillval : bool, default True If True, removes fill values via :func:canvod.auxiliary.preprocessing.strip_fillvalue. **kwargs Ignored (for ABC compatibility).

Returns

xr.Dataset Dataset with dimensions (epoch, sid) that passes :func:~canvod.readers.base.validate_dataset.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    pad_global_sid: bool = True,
    strip_fillval: bool = True,
    **kwargs: object,
) -> xr.Dataset:
    """Convert SBF observations to an ``(epoch, sid)`` xarray Dataset.

    Produces the same structure as :class:`~canvod.readers.rinex.v3_04.Rnxv3Obs`
    and passes :func:`~canvod.readers.base.validate_dataset`.

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to retain.  If ``None``, all five variables are
        kept: ``SNR``, ``Pseudorange``, ``Phase``, ``Doppler``, ``SSI``.
        Note: ``LLI`` is not produced — SBF has no loss-of-lock indicator.
    pad_global_sid : bool, default True
        If ``True``, pads the dataset to the global SID space via
        :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.
    strip_fillval : bool, default True
        If ``True``, removes fill values via
        :func:`canvod.auxiliary.preprocessing.strip_fillvalue`.
    **kwargs
        Ignored (for ABC compatibility).

    Returns
    -------
    xr.Dataset
        Dataset with dimensions ``(epoch, sid)`` that passes
        :func:`~canvod.readers.base.validate_dataset`.
    """
    import math

    freq_nr_cache = self._freq_nr_cache.copy()

    # --- Single pass: collect timestamps, SID properties, and per-epoch obs ---
    # Stores per-epoch obs as dicts (SID → value) so we only scan the file once.
    # Array construction happens afterwards in fast in-memory loops.
    sid_props: dict[str, dict[str, Any]] = {}
    timestamps: list[np.datetime64] = []
    # Per-epoch accumulator: list of (snr_dict, pr_dict, ph_dict, dop_dict)
    epoch_rows: list[
        tuple[
            dict[str, float], dict[str, float], dict[str, float], dict[str, float]
        ]
    ] = []

    for epoch in self.iter_epochs():
        ts_np = np.datetime64(epoch.timestamp.replace(tzinfo=None), "ns")
        timestamps.append(ts_np)

        e_snr: dict[str, float] = {}
        e_pr: dict[str, float] = {}
        e_ph: dict[str, float] = {}
        e_dop: dict[str, float] = {}

        for obs in epoch.observations:
            props = _sid_props_from_obs(obs.svid, obs.signal_num, freq_nr_cache)
            if props is None:
                continue
            sid = props["sid"]
            if sid not in sid_props:
                sid_props[sid] = props
            if obs.cn0 is not None:
                e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
            if obs.pseudorange is not None:
                e_pr[sid] = float(obs.pseudorange.to(UREG.meter).magnitude)
            if obs.phase_cycles is not None:
                e_ph[sid] = obs.phase_cycles
            if obs.doppler is not None:
                e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)

        epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

    sorted_sids = sorted(sid_props)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(timestamps)
    n_sids = len(sorted_sids)

    # Allocate arrays (LLI is dropped — SBF has no loss-of-lock indicator)
    snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
    pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
    ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
    dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
    ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

    for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
        for sid, val in e_snr.items():
            snr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_pr.items():
            pr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_ph.items():
            ph_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_dop.items():
            dop_arr[t_idx, sid_to_idx[sid]] = val

    # Build coordinate arrays
    freq_center = np.asarray(
        [sid_props[s]["freq_center"] for s in sorted_sids],
        dtype=DTYPES["freq_center"],
    )
    freq_min = np.asarray(
        [sid_props[s]["freq_min"] for s in sorted_sids], dtype=DTYPES["freq_min"]
    )
    freq_max = np.asarray(
        [sid_props[s]["freq_max"] for s in sorted_sids], dtype=DTYPES["freq_max"]
    )

    coords: dict[str, Any] = {
        "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": (
            "sid",
            np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    attrs = cast(dict[str, Any], self._build_attrs())

    # Add ECEF position from ReceiverSetup header for pipeline compatibility.
    # ECEFPosition.from_ds_metadata() reads "APPROX POSITION X/Y/Z".
    try:
        import pymap3d as pm

        hdr = self.header
        lat_deg = math.degrees(hdr.latitude_rad)
        lon_deg = math.degrees(hdr.longitude_rad)
        h_m = float(hdr.height_m.to(UREG.meter).magnitude)
        x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
        attrs["APPROX POSITION X"] = float(x)
        attrs["APPROX POSITION Y"] = float(y)
        attrs["APPROX POSITION Z"] = float(z)
    except LookupError, AttributeError:
        pass  # SBF file without a ReceiverSetup block

    ds = xr.Dataset(
        data_vars={
            "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
            "Pseudorange": (
                ["epoch", "sid"],
                pr_arr,
                OBSERVABLES_METADATA["Pseudorange"],
            ),
            "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
            "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
            "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
        },
        coords=coords,
        attrs=attrs,
    )

    # Post-process
    if keep_data_vars is not None:
        for var in list(ds.data_vars):
            if var not in keep_data_vars:
                ds = ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        ds = pad_to_global_sid(
            ds,
            keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
        )

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        ds = strip_fillvalue(ds)

    validate_dataset(ds, required_vars=keep_data_vars)
    return ds

to_metadata_ds(pad_global_sid=True, **kwargs)

Decode SBF metadata blocks to an (epoch, sid) xarray Dataset.

Decodes PVTGeodetic, DOP, ReceiverStatus, SatVisibility, and MeasExtra blocks in a single file scan.

Parameters

pad_global_sid : bool, default True If True, pads to the global SID space via :func:canvod.auxiliary.preprocessing.pad_to_global_sid.

Returns

xr.Dataset Dataset with dimensions (epoch, sid). Epoch-level scalars (PDOP, NrSV, …) are 1-D (epoch,) coordinates. Satellite geometry (theta, phi) and signal quality (MPCorrection, …) are (epoch, sid) data variables.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
def to_metadata_ds(
    self, pad_global_sid: bool = True, **kwargs: object
) -> xr.Dataset:
    """Decode SBF metadata blocks to an ``(epoch, sid)`` xarray Dataset.

    Decodes PVTGeodetic, DOP, ReceiverStatus, SatVisibility, and
    MeasExtra blocks in a single file scan.

    Parameters
    ----------
    pad_global_sid : bool, default True
        If ``True``, pads to the global SID space via
        :func:`canvod.auxiliary.preprocessing.pad_to_global_sid`.

    Returns
    -------
    xr.Dataset
        Dataset with dimensions ``(epoch, sid)``.  Epoch-level scalars
        (PDOP, NrSV, …) are 1-D ``(epoch,)`` coordinates.  Satellite
        geometry (theta, phi) and signal quality (MPCorrection, …) are
        ``(epoch, sid)`` data variables.
    """
    parser = sbf_parser.SbfParser()
    freq_nr_cache = self._freq_nr_cache.copy()

    pending: dict[str, Any] = {
        "pvt": None,
        "dop": None,
        "status": None,
        "satvis": [],
        "extra": [],
    }

    # Each record: (ts, pvt, dop, status, satvis, extra, obs_map)
    records: list[tuple[Any, ...]] = []

    # sid discovery — same logic as to_ds() pass 1
    sid_props: dict[str, dict[str, Any]] = {}

    delta_ls: int = _DEFAULT_DELTA_LS

    for name, data in parser.read(str(self.fpath)):
        match name:
            case "ReceiverTime":
                delta_ls = int(data["DeltaLS"])

            case "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid_cs = int(sat["SVID"])
                    if svid_cs != 0:
                        freq_nr_cache[svid_cs] = int(sat["FreqNr"])

            case "PVTGeodetic":
                pending["pvt"] = data

            case "DOP":
                pending["dop"] = data

            case "ReceiverStatus":
                pending["status"] = data

            case "SatVisibility":
                pending["satvis"] = list(data.get("SatInfo", []))

            case "MeasExtra":
                pending["extra"] = list(data.get("MeasExtraChannel", []))

            case "MeasEpoch":
                tow_ms = int(data["TOW"])
                wn = int(data["WNc"])
                ts = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                obs_map = _build_obs_map(data)

                # Discover sids from Type1 and Type2 sub-blocks
                for t1 in data.get("Type_1", []):
                    svid1 = int(t1["SVID"])
                    props1 = _sid_props_from_obs(
                        svid1,
                        decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                        freq_nr_cache,
                    )
                    if props1 is not None and props1["sid"] not in sid_props:
                        sid_props[props1["sid"]] = props1

                    for t2 in t1.get("Type_2", []):
                        props2 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if props2 is not None and props2["sid"] not in sid_props:
                            sid_props[props2["sid"]] = props2

                records.append(
                    (
                        ts,
                        pending["pvt"],
                        pending["dop"],
                        pending["status"],
                        list(pending["satvis"]),
                        list(pending["extra"]),
                        obs_map,
                    )
                )
                pending = {
                    "pvt": None,
                    "dop": None,
                    "status": None,
                    "satvis": [],
                    "extra": [],
                }

    # Build index structures
    sorted_sids = sorted(sid_props)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(records)
    n_sids = len(sorted_sids)

    # sv → list of sid indices (for SatVisibility broadcasting)
    sids_for_sv: dict[str, list[int]] = {}
    for sid in sorted_sids:
        sv = sid_props[sid]["sv"]
        sids_for_sv.setdefault(sv, []).append(sid_to_idx[sid])

    # (epoch, sid) data variable arrays
    theta_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    phi_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    rise_set_arr = np.full((n_epochs, n_sids), -1, dtype=np.int8)
    mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    smoothing_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    code_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    carr_var_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    lock_time_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    cum_loss_cont_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    car_mp_corr_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)
    cn0_highres_arr = np.full((n_epochs, n_sids), np.nan, dtype=np.float32)

    # (epoch,) scalar coordinate arrays
    pdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    hdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    vdop_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    n_sv_arr = np.full(n_epochs, -1, dtype=np.int16)
    h_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    v_acc_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    pvt_mode_arr = np.full(n_epochs, -1, dtype=np.int8)
    mean_corr_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    cpu_load_arr = np.full(n_epochs, -1, dtype=np.int8)
    temp_arr = np.full(n_epochs, np.nan, dtype=np.float32)
    rx_error_arr = np.full(n_epochs, 0, dtype=np.int32)

    timestamps: list[np.datetime64] = []

    # Fill arrays from records
    for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
        timestamps.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

        # DOP block → pdop, hdop, vdop
        if dop is not None:
            try:
                pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        # PVTGeodetic → n_sv, accuracy, mode, correction age
        if pvt is not None:
            try:
                n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                raw_hacc = int(pvt["HAccuracy"])
                if raw_hacc != 65535:
                    h_acc_arr[t_idx] = raw_hacc * 0.01
                raw_vacc = int(pvt["VAccuracy"])
                if raw_vacc != 65535:
                    v_acc_arr[t_idx] = raw_vacc * 0.01
                pvt_mode_arr[t_idx] = int(pvt["Mode"])
                mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                # Also pick up DOP from PVTGeodetic if DOP block absent
                if np.isnan(pdop_arr[t_idx]):
                    pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        # ReceiverStatus → cpu_load, temperature, rx_error
        if status is not None:
            try:
                cpu_load_arr[t_idx] = int(status["CPULoad"])
                raw_temp = int(status["Temperature"])
                if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                    temp_arr[t_idx] = float(raw_temp - 100)
                rx_error_arr[t_idx] = int(status["RxError"])
            except KeyError, TypeError, ValueError:
                pass

        # SatVisibility → broadcast theta/phi to all sids for that sv
        for sat_info in satvis:
            try:
                svid_raw = int(sat_info["SVID"])
                sys_code, prn = decode_svid(svid_raw)
                sv = f"{sys_code}{prn:02d}"
                theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                phi_deg = int(sat_info["Azimuth"]) * 0.01
                rs = int(sat_info["RiseSet"])
                for s_idx in sids_for_sv.get(sv, []):
                    theta_arr[t_idx, s_idx] = theta_deg
                    phi_arr[t_idx, s_idx] = phi_deg
                    rise_set_arr[t_idx, s_idx] = rs
            except KeyError, TypeError, ValueError:
                pass

        # MeasExtra → per-(epoch, sid) signal quality
        for ch in extra:
            try:
                type_byte = int(ch["Type"])
                info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                sig_num = decode_signal_num(type_byte, info_byte)
                rx_ch = int(ch["RxChannel"])
                svid = obs_map.get((rx_ch, sig_num))
                if svid is None:
                    continue
                sig_def = SIGNAL_TABLE.get(sig_num)
                if sig_def is None:
                    continue
                sys_code2, prn2 = decode_svid(svid)
                sv2 = f"{sys_code2}{prn2:02d}"
                sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                s_idx = sid_to_idx.get(sid)
                if s_idx is None:
                    continue
                mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                # SmoothingCorr: i2, scale 0.001 m/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                raw_sc = ch.get("SmoothingCorr")
                if raw_sc is not None:
                    smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                raw_cv = ch.get("CodeVar")
                # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cv is not None and int(raw_cv) != 65535:
                    code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                raw_rv = ch.get("CarrierVar")
                # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_rv is not None and int(raw_rv) != 65535:
                    carr_var_arr[t_idx, s_idx] = float(raw_rv)
                raw_lt = ch.get("LockTime")
                # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_lt is not None and int(raw_lt) != 65535:
                    lock_time_arr[t_idx, s_idx] = float(raw_lt)
                raw_clc = ch.get("CumLossCont")
                # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_clc is not None:
                    cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                raw_cmc = ch.get("CarMPCorr")
                # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cmc is not None:
                    car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                raw_misc = ch.get("Misc")
                # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_misc is not None:
                    cn0_hr = int(raw_misc) & 0x07
                    cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
            except KeyError, TypeError, ValueError:
                pass

    # Build Dataset
    freq_center = np.asarray(
        [sid_props[s]["freq_center"] for s in sorted_sids], dtype=np.float32
    )
    freq_min = np.asarray(
        [sid_props[s]["freq_min"] for s in sorted_sids], dtype=np.float32
    )
    freq_max = np.asarray(
        [sid_props[s]["freq_max"] for s in sorted_sids], dtype=np.float32
    )

    coords: dict[str, Any] = {
        "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": (
            "sid",
            np.array([sid_props[s]["sv"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            np.array([sid_props[s]["system"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            np.array([sid_props[s]["band"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            np.array([sid_props[s]["code"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        # Epoch-level scalars (1-D over epoch)
        "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
        "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
        "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
        "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
        "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
        "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
        "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
        "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
        "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
        "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
        "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
    }

    attrs = self._build_attrs()

    ds = xr.Dataset(
        data_vars={
            "broadcast_theta": (
                ["epoch", "sid"],
                np.deg2rad(theta_arr),
                _BROADCAST_THETA_ATTRS,
            ),
            "broadcast_phi": (
                ["epoch", "sid"],
                np.deg2rad(phi_arr),
                _BROADCAST_PHI_ATTRS,
            ),
            "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
            "mp_correction_m": (
                ["epoch", "sid"],
                mp_corr_arr,
                _MP_CORRECTION_ATTRS,
            ),
            "smoothing_corr_m": (
                ["epoch", "sid"],
                smoothing_corr_arr,
                _SMOOTHING_CORR_ATTRS,
            ),
            "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
            "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
            "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
            "cum_loss_cont": (
                ["epoch", "sid"],
                cum_loss_cont_arr,
                _CUM_LOSS_CONT_ATTRS,
            ),
            "car_mp_corr_cycles": (
                ["epoch", "sid"],
                car_mp_corr_arr,
                _CAR_MP_CORR_ATTRS,
            ),
            "cn0_highres_correction": (
                ["epoch", "sid"],
                cn0_highres_arr,
                _CN0_HIGHRES_CORRECTION_ATTRS,
            ),
        },
        coords=coords,
        attrs=attrs,
    )

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        ds = pad_to_global_sid(
            ds,
            keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
        )

    return ds

to_ds_and_auxiliary(keep_data_vars=None, pad_global_sid=True, strip_fillval=True, store_raw_observables=True, **kwargs)

Single file scan producing both the obs dataset and the SBF metadata dataset.

Performs ONE parser.read() pass, collecting MeasEpoch observations and PVTGeodetic/DOP/SatVisibility/MeasExtra metadata blocks simultaneously. to_ds() and to_metadata_ds() remain unchanged for standalone use.

Parameters

keep_data_vars : list of str, optional Data variables to retain in the obs dataset. pad_global_sid : bool, default True Pad obs dataset to the global SID space. strip_fillval : bool, default True Strip fill values from the obs dataset. store_raw_observables : bool, default True Add pre-correction "raw" observable variables to the obs dataset: SNR_raw, Pseudorange_unsmoothed, Pseudorange_raw, Phase_raw. Set to False to reduce dataset size when these are not needed. **kwargs Forwarded to pad_to_global_sid (e.g. keep_sids).

Returns

tuple[xr.Dataset, dict[str, xr.Dataset]] (obs_ds, {"sbf_obs": meta_ds}).

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
def to_ds_and_auxiliary(
    self,
    keep_data_vars: list[str] | None = None,
    pad_global_sid: bool = True,
    strip_fillval: bool = True,
    store_raw_observables: bool = True,
    **kwargs: object,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
    """Single file scan producing both the obs dataset and the SBF metadata dataset.

    Performs ONE ``parser.read()`` pass, collecting MeasEpoch observations
    and PVTGeodetic/DOP/SatVisibility/MeasExtra metadata blocks simultaneously.
    ``to_ds()`` and ``to_metadata_ds()`` remain unchanged for standalone use.

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to retain in the obs dataset.
    pad_global_sid : bool, default True
        Pad obs dataset to the global SID space.
    strip_fillval : bool, default True
        Strip fill values from the obs dataset.
    store_raw_observables : bool, default True
        Add pre-correction "raw" observable variables to the obs dataset:
        ``SNR_raw``, ``Pseudorange_unsmoothed``, ``Pseudorange_raw``,
        ``Phase_raw``.  Set to ``False`` to reduce dataset size when these
        are not needed.
    **kwargs
        Forwarded to ``pad_to_global_sid`` (e.g. ``keep_sids``).

    Returns
    -------
    tuple[xr.Dataset, dict[str, xr.Dataset]]
        ``(obs_ds, {"sbf_obs": meta_ds})``.
    """
    import math

    parser = sbf_parser.SbfParser()
    freq_nr_cache = self._freq_nr_cache.copy()
    delta_ls: int = _DEFAULT_DELTA_LS

    # Separate sid discovery for obs (matches to_ds) and metadata (matches to_metadata_ds)
    sid_props_obs: dict[str, dict[str, Any]] = {}
    sid_props_meta: dict[str, dict[str, Any]] = {}

    # Obs-side accumulators (same as to_ds)
    timestamps_obs: list[np.datetime64] = []
    epoch_rows: list[
        tuple[
            dict[str, float], dict[str, float], dict[str, float], dict[str, float]
        ]
    ] = []

    # Metadata-side accumulators (same as to_metadata_ds)
    pending: dict[str, Any] = {
        "pvt": None,
        "dop": None,
        "status": None,
        "satvis": [],
        "extra": [],
    }
    records: list[tuple[Any, ...]] = []

    for name, data in parser.read(str(self.fpath)):
        match name:
            case "ReceiverTime":
                delta_ls = int(data["DeltaLS"])

            case "ChannelStatus":
                for sat in data.get("ChannelSatInfo", []):
                    svid = int(sat["SVID"])
                    if svid != 0:
                        freq_nr_cache[svid] = int(sat["FreqNr"])

            case "PVTGeodetic":
                pending["pvt"] = data

            case "DOP":
                pending["dop"] = data

            case "ReceiverStatus":
                pending["status"] = data

            case "SatVisibility":
                pending["satvis"] = list(data.get("SatInfo", []))

            case "MeasExtra":
                pending["extra"] = list(data.get("MeasExtraChannel", []))

            case "MeasEpoch":
                # --- Obs side ---
                epoch = self._decode_epoch(data, freq_nr_cache, delta_ls)
                if epoch is not None:
                    ts_np = np.datetime64(
                        epoch.timestamp.replace(tzinfo=None), "ns"
                    )
                    timestamps_obs.append(ts_np)
                    e_snr: dict[str, float] = {}
                    e_pr: dict[str, float] = {}
                    e_ph: dict[str, float] = {}
                    e_dop: dict[str, float] = {}
                    for obs in epoch.observations:
                        props = _sid_props_from_obs(
                            obs.svid, obs.signal_num, freq_nr_cache
                        )
                        if props is None:
                            continue
                        sid = props["sid"]
                        if sid not in sid_props_obs:
                            sid_props_obs[sid] = props
                        if obs.cn0 is not None:
                            e_snr[sid] = float(obs.cn0.to(UREG.dBHz).magnitude)
                        if obs.pseudorange is not None:
                            e_pr[sid] = float(
                                obs.pseudorange.to(UREG.meter).magnitude
                            )
                        if obs.phase_cycles is not None:
                            e_ph[sid] = obs.phase_cycles
                        if obs.doppler is not None:
                            e_dop[sid] = float(obs.doppler.to(UREG.Hz).magnitude)
                    epoch_rows.append((e_snr, e_pr, e_ph, e_dop))

                # --- Metadata side (always, even if epoch decoded as None) ---
                tow_ms = int(data["TOW"])
                wn = int(data["WNc"])
                ts_meta = _tow_wn_to_utc(tow_ms, wn, delta_ls)
                obs_map = _build_obs_map(data)

                # Discover sids from Type1/Type2 sub-blocks (same as to_metadata_ds)
                for t1 in data.get("Type_1", []):
                    svid1 = int(t1["SVID"])
                    props1 = _sid_props_from_obs(
                        svid1,
                        decode_signal_num(int(t1["Type"]), int(t1["ObsInfo"])),
                        freq_nr_cache,
                    )
                    if props1 is not None and props1["sid"] not in sid_props_meta:
                        sid_props_meta[props1["sid"]] = props1
                    for t2 in t1.get("Type_2", []):
                        props2 = _sid_props_from_obs(
                            svid1,
                            decode_signal_num(int(t2["Type"]), int(t2["ObsInfo"])),
                            freq_nr_cache,
                        )
                        if (
                            props2 is not None
                            and props2["sid"] not in sid_props_meta
                        ):
                            sid_props_meta[props2["sid"]] = props2

                records.append(
                    (
                        ts_meta,
                        pending["pvt"],
                        pending["dop"],
                        pending["status"],
                        list(pending["satvis"]),
                        list(pending["extra"]),
                        obs_map,
                    )
                )
                pending = {
                    "pvt": None,
                    "dop": None,
                    "status": None,
                    "satvis": [],
                    "extra": [],
                }

    # ----------------------------------------------------------------
    # Build obs dataset (verbatim from to_ds())
    # ----------------------------------------------------------------
    sorted_sids = sorted(sid_props_obs)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(timestamps_obs)
    n_sids = len(sorted_sids)

    snr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
    pr_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
    ph_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
    dop_arr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
    ssi_arr = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

    for t_idx, (e_snr, e_pr, e_ph, e_dop) in enumerate(epoch_rows):
        for sid, val in e_snr.items():
            snr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_pr.items():
            pr_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_ph.items():
            ph_arr[t_idx, sid_to_idx[sid]] = val
        for sid, val in e_dop.items():
            dop_arr[t_idx, sid_to_idx[sid]] = val

    freq_center = np.asarray(
        [sid_props_obs[s]["freq_center"] for s in sorted_sids],
        dtype=DTYPES["freq_center"],
    )
    freq_min = np.asarray(
        [sid_props_obs[s]["freq_min"] for s in sorted_sids],
        dtype=DTYPES["freq_min"],
    )
    freq_max = np.asarray(
        [sid_props_obs[s]["freq_max"] for s in sorted_sids],
        dtype=DTYPES["freq_max"],
    )

    coords_obs: dict[str, Any] = {
        "epoch": ("epoch", timestamps_obs, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": (
            "sid",
            np.array([sid_props_obs[s]["sv"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            np.array(
                [sid_props_obs[s]["system"] for s in sorted_sids], dtype=object
            ),
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            np.array([sid_props_obs[s]["band"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            np.array([sid_props_obs[s]["code"] for s in sorted_sids], dtype=object),
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    attrs = cast(dict[str, Any], self._build_attrs())

    try:
        import pymap3d as pm

        hdr = self.header
        lat_deg = math.degrees(hdr.latitude_rad)
        lon_deg = math.degrees(hdr.longitude_rad)
        h_m = float(hdr.height_m.to(UREG.meter).magnitude)
        x, y, z = pm.geodetic2ecef(lat_deg, lon_deg, h_m)
        attrs["APPROX POSITION X"] = float(x)
        attrs["APPROX POSITION Y"] = float(y)
        attrs["APPROX POSITION Z"] = float(z)
    except LookupError, AttributeError:
        pass

    obs_ds = xr.Dataset(
        data_vars={
            "SNR": (["epoch", "sid"], snr_arr, CN0_METADATA),
            "Pseudorange": (
                ["epoch", "sid"],
                pr_arr,
                OBSERVABLES_METADATA["Pseudorange"],
            ),
            "Phase": (["epoch", "sid"], ph_arr, OBSERVABLES_METADATA["Phase"]),
            "Doppler": (["epoch", "sid"], dop_arr, OBSERVABLES_METADATA["Doppler"]),
            "SSI": (["epoch", "sid"], ssi_arr, OBSERVABLES_METADATA["SSI"]),
        },
        coords=coords_obs,
        attrs=attrs,
    )

    if keep_data_vars is not None:
        for var in list(obs_ds.data_vars):
            if var not in keep_data_vars:
                obs_ds = obs_ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        obs_ds = pad_to_global_sid(
            obs_ds,
            keep_sids=cast(list[str] | None, kwargs.get("keep_sids")),
        )

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        obs_ds = strip_fillvalue(obs_ds)

    validate_dataset(obs_ds, required_vars=keep_data_vars)

    # ----------------------------------------------------------------
    # Build metadata dataset (verbatim from to_metadata_ds())
    # ----------------------------------------------------------------
    sorted_sids_meta = sorted(sid_props_meta)
    sid_to_idx_meta = {sid: i for i, sid in enumerate(sorted_sids_meta)}
    n_epochs_meta = len(records)
    n_sids_meta = len(sorted_sids_meta)

    sids_for_sv: dict[str, list[int]] = {}
    for sid in sorted_sids_meta:
        sv = sid_props_meta[sid]["sv"]
        sids_for_sv.setdefault(sv, []).append(sid_to_idx_meta[sid])

    theta_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    phi_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    rise_set_arr = np.full((n_epochs_meta, n_sids_meta), -1, dtype=np.int8)
    mp_corr_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    smoothing_corr_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )
    code_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    carr_var_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    lock_time_arr = np.full((n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32)
    cum_loss_cont_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )
    car_mp_corr_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )
    cn0_highres_arr = np.full(
        (n_epochs_meta, n_sids_meta), np.nan, dtype=np.float32
    )

    pdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    hdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    vdop_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    n_sv_arr = np.full(n_epochs_meta, -1, dtype=np.int16)
    h_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    v_acc_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    pvt_mode_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
    mean_corr_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    cpu_load_arr = np.full(n_epochs_meta, -1, dtype=np.int8)
    temp_arr = np.full(n_epochs_meta, np.nan, dtype=np.float32)
    rx_error_arr = np.full(n_epochs_meta, 0, dtype=np.int32)

    timestamps_meta: list[np.datetime64] = []

    for t_idx, (ts, pvt, dop, status, satvis, extra, obs_map) in enumerate(records):
        timestamps_meta.append(np.datetime64(ts.replace(tzinfo=None), "ns"))

        if dop is not None:
            try:
                pdop_arr[t_idx] = float(dop["PDOP"]) * 0.01
                hdop_arr[t_idx] = float(dop["HDOP"]) * 0.01
                vdop_arr[t_idx] = float(dop["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        if pvt is not None:
            try:
                n_sv_arr[t_idx] = int(pvt.get("NrSV", pvt.get("NrSVAnt", -1)))
                raw_hacc = int(pvt["HAccuracy"])
                if raw_hacc != 65535:
                    h_acc_arr[t_idx] = raw_hacc * 0.01
                raw_vacc = int(pvt["VAccuracy"])
                if raw_vacc != 65535:
                    v_acc_arr[t_idx] = raw_vacc * 0.01
                pvt_mode_arr[t_idx] = int(pvt["Mode"])
                mean_corr_arr[t_idx] = float(pvt["MeanCorrAge"]) * 0.01
                if np.isnan(pdop_arr[t_idx]):
                    pdop_arr[t_idx] = float(pvt["PDOP"]) * 0.01
                    hdop_arr[t_idx] = float(pvt["HDOP"]) * 0.01
                    vdop_arr[t_idx] = float(pvt["VDOP"]) * 0.01
            except KeyError, TypeError, ValueError:
                pass

        if status is not None:
            try:
                cpu_load_arr[t_idx] = int(status["CPULoad"])
                raw_temp = int(status["Temperature"])
                if raw_temp != 0:  # 0 is DoNotUse (RefGuide p.397)
                    temp_arr[t_idx] = float(raw_temp - 100)
                rx_error_arr[t_idx] = int(status["RxError"])
            except KeyError, TypeError, ValueError:
                pass

        for sat_info in satvis:
            try:
                svid_raw = int(sat_info["SVID"])
                sys_code, prn = decode_svid(svid_raw)
                sv = f"{sys_code}{prn:02d}"
                theta_deg = 90.0 - int(sat_info["Elevation"]) * 0.01
                phi_deg = int(sat_info["Azimuth"]) * 0.01
                rs = int(sat_info["RiseSet"])
                for s_idx in sids_for_sv.get(sv, []):
                    theta_arr[t_idx, s_idx] = theta_deg
                    phi_arr[t_idx, s_idx] = phi_deg
                    rise_set_arr[t_idx, s_idx] = rs
            except KeyError, TypeError, ValueError:
                pass

        for ch in extra:
            try:
                type_byte = int(ch["Type"])
                info_byte = int(ch.get("ObsInfo", ch.get("Info", 0)))
                sig_num = decode_signal_num(type_byte, info_byte)
                rx_ch = int(ch["RxChannel"])
                svid = obs_map.get((rx_ch, sig_num))
                if svid is None:
                    continue
                sig_def = SIGNAL_TABLE.get(sig_num)
                if sig_def is None:
                    continue
                sys_code2, prn2 = decode_svid(svid)
                sv2 = f"{sys_code2}{prn2:02d}"
                sid = f"{sv2}|{sig_def.band}|{sig_def.code}"
                s_idx = sid_to_idx_meta.get(sid)
                if s_idx is None:
                    continue
                mp_raw = int(ch.get("MPCorrection ", ch.get("MPCorrection", 0)))
                mp_corr_arr[t_idx, s_idx] = mp_raw * 0.001
                # SmoothingCorr: i2, scale 0.001 m/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                raw_sc = ch.get("SmoothingCorr")
                if raw_sc is not None:
                    smoothing_corr_arr[t_idx, s_idx] = int(raw_sc) * 0.001
                raw_cv = ch.get("CodeVar")
                # CodeVar: u2, scale 0.0001 m²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cv is not None and int(raw_cv) != 65535:
                    code_var_arr[t_idx, s_idx] = int(raw_cv) * 1e-4
                raw_rv = ch.get("CarrierVar")
                # CarrierVar: u2, scale 1 mcycle²/LSB, Do-Not-Use 65535
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_rv is not None and int(raw_rv) != 65535:
                    carr_var_arr[t_idx, s_idx] = float(raw_rv)
                raw_lt = ch.get("LockTime")
                # LockTime: u2, scale 1 s/LSB, Do-Not-Use 65535, clipped to 65534 s
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_lt is not None and int(raw_lt) != 65535:
                    lock_time_arr[t_idx, s_idx] = float(raw_lt)
                raw_clc = ch.get("CumLossCont")
                # CumLossCont: u1, modulo-256 counter, no Do-Not-Use
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_clc is not None:
                    cum_loss_cont_arr[t_idx, s_idx] = float(int(raw_clc))
                raw_cmc = ch.get("CarMPCorr")
                # CarMPCorr: i1, scale 1/512 cycles/LSB (1.953125 mcycles/LSB)
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_cmc is not None:
                    car_mp_corr_arr[t_idx, s_idx] = int(raw_cmc) / 512.0
                raw_misc = ch.get("Misc")
                # Misc bits 0-2: CN0HighRes (u3, 0-7), scale 0.03125 dB-Hz/LSB
                # RefGuide-4.14.0, MeasExtra (Block 4000), MeasExtraChannelSub, p.265
                if raw_misc is not None:
                    cn0_hr = int(raw_misc) & 0x07
                    cn0_highres_arr[t_idx, s_idx] = cn0_hr * 0.03125
            except KeyError, TypeError, ValueError:
                pass

    freq_center_meta = np.asarray(
        [sid_props_meta[s]["freq_center"] for s in sorted_sids_meta],
        dtype=np.float32,
    )
    freq_min_meta = np.asarray(
        [sid_props_meta[s]["freq_min"] for s in sorted_sids_meta], dtype=np.float32
    )
    freq_max_meta = np.asarray(
        [sid_props_meta[s]["freq_max"] for s in sorted_sids_meta], dtype=np.float32
    )

    coords_meta: dict[str, Any] = {
        "epoch": ("epoch", timestamps_meta, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            sorted_sids_meta, dims=["sid"], attrs=COORDS_METADATA["sid"]
        ),
        "sv": (
            "sid",
            [sid_props_meta[s]["sv"] for s in sorted_sids_meta],
            COORDS_METADATA["sv"],
        ),
        "system": (
            "sid",
            [sid_props_meta[s]["system"] for s in sorted_sids_meta],
            COORDS_METADATA["system"],
        ),
        "band": (
            "sid",
            [sid_props_meta[s]["band"] for s in sorted_sids_meta],
            COORDS_METADATA["band"],
        ),
        "code": (
            "sid",
            [sid_props_meta[s]["code"] for s in sorted_sids_meta],
            COORDS_METADATA["code"],
        ),
        "freq_center": ("sid", freq_center_meta, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min_meta, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max_meta, COORDS_METADATA["freq_max"]),
        "pdop": ("epoch", pdop_arr, _PDOP_ATTRS),
        "hdop": ("epoch", hdop_arr, _HDOP_ATTRS),
        "vdop": ("epoch", vdop_arr, _VDOP_ATTRS),
        "n_sv": ("epoch", n_sv_arr, _N_SV_ATTRS),
        "h_accuracy_m": ("epoch", h_acc_arr, _H_ACCURACY_ATTRS),
        "v_accuracy_m": ("epoch", v_acc_arr, _V_ACCURACY_ATTRS),
        "pvt_mode": ("epoch", pvt_mode_arr, _PVT_MODE_ATTRS),
        "mean_corr_age_s": ("epoch", mean_corr_arr, _MEAN_CORR_AGE_ATTRS),
        "cpu_load": ("epoch", cpu_load_arr, _CPU_LOAD_ATTRS),
        "temperature_c": ("epoch", temp_arr, _TEMPERATURE_ATTRS),
        "rx_error": ("epoch", rx_error_arr, _RX_ERROR_ATTRS),
    }

    attrs_meta = self._build_attrs()

    meta_ds = xr.Dataset(
        data_vars={
            "broadcast_theta": (
                ["epoch", "sid"],
                np.deg2rad(theta_arr),
                _BROADCAST_THETA_ATTRS,
            ),
            "broadcast_phi": (
                ["epoch", "sid"],
                np.deg2rad(phi_arr),
                _BROADCAST_PHI_ATTRS,
            ),
            "rise_set": (["epoch", "sid"], rise_set_arr, _RISE_SET_ATTRS),
            "mp_correction_m": (
                ["epoch", "sid"],
                mp_corr_arr,
                _MP_CORRECTION_ATTRS,
            ),
            "smoothing_corr_m": (
                ["epoch", "sid"],
                smoothing_corr_arr,
                _SMOOTHING_CORR_ATTRS,
            ),
            "code_var": (["epoch", "sid"], code_var_arr, _CODE_VAR_ATTRS),
            "carrier_var": (["epoch", "sid"], carr_var_arr, _CARRIER_VAR_ATTRS),
            "lock_time_s": (["epoch", "sid"], lock_time_arr, _LOCK_TIME_ATTRS),
            "cum_loss_cont": (
                ["epoch", "sid"],
                cum_loss_cont_arr,
                _CUM_LOSS_CONT_ATTRS,
            ),
            "car_mp_corr_cycles": (
                ["epoch", "sid"],
                car_mp_corr_arr,
                _CAR_MP_CORR_ATTRS,
            ),
            "cn0_highres_correction": (
                ["epoch", "sid"],
                cn0_highres_arr,
                _CN0_HIGHRES_CORRECTION_ATTRS,
            ),
        },
        coords=coords_meta,
        attrs=attrs_meta,
    )

    # Align meta_ds SID to obs_ds SID.
    # obs uses sid_props_obs (MeasEpoch); meta uses sid_props_meta (Type1/Type2)
    # — they can diverge.  Reindex fills missing SIDs with NaN.
    meta_ds = meta_ds.reindex(sid=obs_ds.sid, fill_value=np.nan)
    # rise_set is int8 with sentinel -1; NaN fill promotes to float — cast back.
    if meta_ds["rise_set"].dtype != np.int8:
        meta_ds["rise_set"] = meta_ds["rise_set"].fillna(-1).astype(np.int8)

    # Apply CN0HighRes correction from MeasExtra (Block 4000) to SNR.
    # CN0HighRes extends resolution from 0.25 to 0.03125 dB-Hz.
    # RefGuide-4.14.0, MeasExtra MeasExtraChannelSub.Misc bits 0-2, p.265.
    # Where MeasExtra was not logged the correction array is NaN → no-op.
    corr = meta_ds["cn0_highres_correction"].values  # (epoch, sid), NaN if absent
    snr_raw_values = obs_ds["SNR"].values.copy()  # preserve 0.25 dB-Hz original
    snr_corrected = snr_raw_values.copy()
    valid = ~np.isnan(snr_corrected) & ~np.isnan(corr)
    snr_corrected[valid] += corr[valid]
    snr_attrs = dict(obs_ds["SNR"].attrs)
    snr_attrs["comment"] = (
        snr_attrs.get("comment", "")
        + " CN0HighRes correction from MeasExtra (Block 4000, p.265) applied where"
        " available, improving resolution from 0.25 to 0.03125 dB-Hz."
    ).lstrip()
    obs_ds["SNR"] = xr.DataArray(
        snr_corrected,
        dims=["epoch", "sid"],
        coords=obs_ds["SNR"].coords,
        attrs=snr_attrs,
    )

    if store_raw_observables:
        # ------------------------------------------------------------------
        # Add "physically raw" observables: pre-correction versions of SNR,
        # pseudorange, and carrier phase.  NaN where MeasExtra was absent.
        # Gated by store_raw_observables (config: store_sbf_raw_observables).
        # ------------------------------------------------------------------

        # SNR_raw: 0.25 dB-Hz resolution, before CN0HighRes extension.
        obs_ds["SNR_raw"] = xr.DataArray(
            snr_raw_values,
            dims=["epoch", "sid"],
            coords=obs_ds["SNR"].coords,
            attrs=_SNR_RAW_ATTRS,
        )

        # Pseudorange_unsmoothed: Hatch-filter correction removed.
        smooth = meta_ds["smoothing_corr_m"].values
        pr_vals = obs_ds["Pseudorange"].values
        pr_unsmoothed = np.where(
            ~np.isnan(smooth), pr_vals + smooth, np.nan
        ).astype(np.float64)
        obs_ds["Pseudorange_unsmoothed"] = xr.DataArray(
            pr_unsmoothed,
            dims=["epoch", "sid"],
            coords=obs_ds["Pseudorange"].coords,
            attrs=_PSEUDORANGE_UNSMOOTHED_ATTRS,
        )

        # Pseudorange_raw: both Hatch-filter and multipath corrections removed.
        mp = meta_ds["mp_correction_m"].values
        available = ~np.isnan(smooth) & ~np.isnan(mp)
        pr_raw = np.where(available, pr_vals + smooth + mp, np.nan).astype(
            np.float64
        )
        obs_ds["Pseudorange_raw"] = xr.DataArray(
            pr_raw,
            dims=["epoch", "sid"],
            coords=obs_ds["Pseudorange"].coords,
            attrs=_PSEUDORANGE_RAW_ATTRS,
        )

        # Phase_raw: carrier multipath correction removed.
        car_mp = meta_ds["car_mp_corr_cycles"].values
        ph_vals = obs_ds["Phase"].values
        ph_raw = np.where(~np.isnan(car_mp), ph_vals + car_mp, np.nan).astype(
            np.float64
        )
        obs_ds["Phase_raw"] = xr.DataArray(
            ph_raw,
            dims=["epoch", "sid"],
            coords=obs_ds["Phase"].coords,
            attrs=_PHASE_RAW_ATTRS,
        )

    return obs_ds, {"sbf_obs": meta_ds}

__repr__()

Return a short string representation.

Source code in packages/canvod-readers/src/canvod/readers/sbf/reader.py
2481
2482
2483
def __repr__(self) -> str:
    """Return a short string representation."""
    return f"SbfReader(file='{self.fpath.name}', epochs={self.num_epochs})"

DataDirMatcher

Match RINEX data directories for canopy and reference receivers.

Scans a root directory structure to find dates with RINEX files present in both canopy and reference receiver directories.

Parameters

root : Path Root directory containing receiver subdirectories reference_pattern : Path, optional Relative path pattern for reference receiver (default: "01_reference/01_GNSS/01_raw") canopy_pattern : Path, optional Relative path pattern for canopy receiver (default: "02_canopy/01_GNSS/01_raw")

Examples

from pathlib import Path matcher = DataDirMatcher( ... root=Path("/data/01_Rosalia"), ... reference_pattern=Path("01_reference/01_GNSS/01_raw"), ... canopy_pattern=Path("02_canopy/01_GNSS/01_raw") ... )

Iterate over matched directories

for matched_dirs in matcher: ... print(matched_dirs.yyyydoy) ... rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o")) ... print(f" Found {len(rinex_files)} RINEX files")

Get list of common dates

dates = matcher.get_common_dates() print(f"Found {len(dates)} dates with data")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
class DataDirMatcher:
    """Match RINEX data directories for canopy and reference receivers.

    Scans a root directory structure to find dates with RINEX files
    present in both canopy and reference receiver directories.

    Parameters
    ----------
    root : Path
        Root directory containing receiver subdirectories
    reference_pattern : Path, optional
        Relative path pattern for reference receiver
        (default: "01_reference/01_GNSS/01_raw")
    canopy_pattern : Path, optional
        Relative path pattern for canopy receiver
        (default: "02_canopy/01_GNSS/01_raw")

    Examples
    --------
    >>> from pathlib import Path
    >>> matcher = DataDirMatcher(
    ...     root=Path("/data/01_Rosalia"),
    ...     reference_pattern=Path("01_reference/01_GNSS/01_raw"),
    ...     canopy_pattern=Path("02_canopy/01_GNSS/01_raw")
    ... )
    >>>
    >>> # Iterate over matched directories
    >>> for matched_dirs in matcher:
    ...     print(matched_dirs.yyyydoy)
    ...     rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o"))
    ...     print(f"  Found {len(rinex_files)} RINEX files")

    >>> # Get list of common dates
    >>> dates = matcher.get_common_dates()
    >>> print(f"Found {len(dates)} dates with data")

    """

    def __init__(
        self,
        root: Path,
        reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
        canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
    ) -> None:
        """Initialize matcher with directory structure."""
        import warnings

        warnings.warn(
            "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.root = Path(root)
        self.reference_dir = self.root / reference_pattern
        self.canopy_dir = self.root / canopy_pattern

        # Validate directories exist
        self._validate_directory(self.root, "Root")
        self._validate_directory(self.reference_dir, "Reference")
        self._validate_directory(self.canopy_dir, "Canopy")

    def __iter__(self) -> Iterator[MatchedDirs]:
        """Iterate over matched directory pairs with RINEX files.

        Yields
        ------
        MatchedDirs
            Matched directories for each date.

        """
        for date_str in self.get_common_dates():
            yield MatchedDirs(
                canopy_data_dir=self.canopy_dir / date_str,
                reference_data_dir=self.reference_dir / date_str,
                yyyydoy=YYYYDOY.from_yydoy_str(date_str),
            )

    def get_common_dates(self) -> list[str]:
        """Get dates with RINEX files in both receivers.

        Uses parallel processing to check directories efficiently.

        Returns
        -------
        list[str]
            Sorted list of date strings (YYDDD format, e.g., "25001")
            that have RINEX files in both canopy and reference directories.

        """
        # Find dates with RINEX in each receiver
        ref_dates = self._get_dates_with_rinex(self.reference_dir)
        can_dates = self._get_dates_with_rinex(self.canopy_dir)

        # Find intersection
        common = ref_dates & can_dates
        common.discard("00000")  # Remove placeholder directories

        # Sort naturally (numerical order)
        return natsorted(common)

    def _get_dates_with_rinex(self, base_dir: Path) -> set[str]:
        """Find all date directories containing RINEX files.

        Uses parallel processing to check multiple directories at once.

        Parameters
        ----------
        base_dir : Path
            Base directory to search (e.g., canopy or reference root).

        Returns
        -------
        set[str]
            Set of date directory names that contain RINEX files.

        """
        # Get all subdirectories
        date_dirs = (d for d in base_dir.iterdir() if d.is_dir())

        # Check for RINEX files in parallel
        dates_with_rinex = set()

        with ThreadPoolExecutor() as executor:
            future_to_dir = {
                executor.submit(self._has_rinex_files, d): d for d in date_dirs
            }

            for future in as_completed(future_to_dir):
                directory = future_to_dir[future]
                if future.result():
                    dates_with_rinex.add(directory.name)

        return dates_with_rinex

    @staticmethod
    def _has_rinex_files(directory: Path) -> bool:
        """Check if directory contains RINEX observation files.

        Parameters
        ----------
        directory : Path
            Directory to check.

        Returns
        -------
        bool
            True if RINEX files found.

        """
        return _has_rinex_files(directory)

    def _validate_directory(self, path: Path, name: str) -> None:
        """Validate directory exists.

        Parameters
        ----------
        path : Path
            Directory to check.
        name : str
            Name for error message.

        Raises
        ------
        FileNotFoundError
            If directory doesn't exist.

        """
        if not path.exists():
            msg = f"{name} directory not found: {path}"
            raise FileNotFoundError(msg)

__init__(root, reference_pattern=Path('01_reference/01_GNSS/01_raw'), canopy_pattern=Path('02_canopy/01_GNSS/01_raw'))

Initialize matcher with directory structure.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def __init__(
    self,
    root: Path,
    reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
    canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
) -> None:
    """Initialize matcher with directory structure."""
    import warnings

    warnings.warn(
        "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.root = Path(root)
    self.reference_dir = self.root / reference_pattern
    self.canopy_dir = self.root / canopy_pattern

    # Validate directories exist
    self._validate_directory(self.root, "Root")
    self._validate_directory(self.reference_dir, "Reference")
    self._validate_directory(self.canopy_dir, "Canopy")

__iter__()

Iterate over matched directory pairs with RINEX files.

Yields

MatchedDirs Matched directories for each date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def __iter__(self) -> Iterator[MatchedDirs]:
    """Iterate over matched directory pairs with RINEX files.

    Yields
    ------
    MatchedDirs
        Matched directories for each date.

    """
    for date_str in self.get_common_dates():
        yield MatchedDirs(
            canopy_data_dir=self.canopy_dir / date_str,
            reference_data_dir=self.reference_dir / date_str,
            yyyydoy=YYYYDOY.from_yydoy_str(date_str),
        )

get_common_dates()

Get dates with RINEX files in both receivers.

Uses parallel processing to check directories efficiently.

Returns

list[str] Sorted list of date strings (YYDDD format, e.g., "25001") that have RINEX files in both canopy and reference directories.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def get_common_dates(self) -> list[str]:
    """Get dates with RINEX files in both receivers.

    Uses parallel processing to check directories efficiently.

    Returns
    -------
    list[str]
        Sorted list of date strings (YYDDD format, e.g., "25001")
        that have RINEX files in both canopy and reference directories.

    """
    # Find dates with RINEX in each receiver
    ref_dates = self._get_dates_with_rinex(self.reference_dir)
    can_dates = self._get_dates_with_rinex(self.canopy_dir)

    # Find intersection
    common = ref_dates & can_dates
    common.discard("00000")  # Remove placeholder directories

    # Sort naturally (numerical order)
    return natsorted(common)

PairDataDirMatcher

Match RINEX directories for receiver pairs across dates.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site. Requires a configuration dict specifying receiver locations and analysis pairs.

Parameters

base_dir : Path Root directory containing all receiver data receivers : dict Receiver configuration mapping receiver names to their directory paths. The directory value is the full relative path from base_dir to the raw RINEX data directory (before the {YYDOY} date folders). Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"}, "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}} analysis_pairs : dict Analysis pair configuration specifying which receivers to match Example: {"pair_01": {"canopy_receiver": "canopy_01", "reference_receiver": "reference_01"}}

Examples

receivers = { ... "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"}, ... "reference_01": {"directory": "01_reference/01_GNSS/01_raw"} ... } pairs = { ... "main_pair": { ... "canopy_receiver": "canopy_01", ... "reference_receiver": "reference_01" ... } ... }

matcher = PairDataDirMatcher( ... base_dir=Path("/data/01_Rosalia"), ... receivers=receivers, ... analysis_pairs=pairs ... )

for matched in matcher: ... print(f"{matched.yyyydoy}: {matched.pair_name}") ... print(f" Canopy: {matched.canopy_data_dir}") ... print(f" Reference: {matched.reference_data_dir}")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
class PairDataDirMatcher:
    """Match RINEX directories for receiver pairs across dates.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site. Requires a configuration dict
    specifying receiver locations and analysis pairs.

    Parameters
    ----------
    base_dir : Path
        Root directory containing all receiver data
    receivers : dict
        Receiver configuration mapping receiver names to their directory paths.
        The ``directory`` value is the full relative path from ``base_dir`` to the
        raw RINEX data directory (before the ``{YYDOY}`` date folders).
        Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"},
                  "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}}
    analysis_pairs : dict
        Analysis pair configuration specifying which receivers to match
        Example: {"pair_01": {"canopy_receiver": "canopy_01",
                               "reference_receiver": "reference_01"}}

    Examples
    --------
    >>> receivers = {
    ...     "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"},
    ...     "reference_01": {"directory": "01_reference/01_GNSS/01_raw"}
    ... }
    >>> pairs = {
    ...     "main_pair": {
    ...         "canopy_receiver": "canopy_01",
    ...         "reference_receiver": "reference_01"
    ...     }
    ... }
    >>>
    >>> matcher = PairDataDirMatcher(
    ...     base_dir=Path("/data/01_Rosalia"),
    ...     receivers=receivers,
    ...     analysis_pairs=pairs
    ... )
    >>>
    >>> for matched in matcher:
    ...     print(f"{matched.yyyydoy}: {matched.pair_name}")
    ...     print(f"  Canopy: {matched.canopy_data_dir}")
    ...     print(f"  Reference: {matched.reference_data_dir}")

    """

    def __init__(
        self,
        base_dir: Path,
        receivers: dict[str, dict[str, str]],
        analysis_pairs: dict[str, dict[str, str]],
    ) -> None:
        """Initialize pair matcher with receiver configuration."""
        import warnings

        warnings.warn(
            "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.base_dir = Path(base_dir)
        self.receivers = receivers
        self.analysis_pairs = analysis_pairs

        # Validate receivers have directory config
        self.receiver_dirs = self._build_receiver_dir_mapping()

    def _build_receiver_dir_mapping(self) -> dict[str, str]:
        """Map receiver names to their directory prefixes.

        Returns
        -------
        dict[str, str]
            Mapping of receiver name to directory path.

        Raises
        ------
        ValueError
            If receiver missing 'directory' in config.

        """
        mapping = {}
        for receiver_name, config in self.receivers.items():
            if "directory" not in config:
                msg = f"Receiver '{receiver_name}' missing 'directory' in config"
                raise ValueError(msg)
            mapping[receiver_name] = config["directory"]
        return mapping

    def _get_receiver_path(self, receiver_name: str, yyyydoy: YYYYDOY) -> Path:
        """Build full path to receiver data for a specific date.

        Parameters
        ----------
        receiver_name : str
            Receiver name (e.g., "canopy_01").
        yyyydoy : YYYYDOY
            Date object.

        Returns
        -------
        Path
            Full path to receiver's RINEX directory for the date.

        """
        receiver_dir = self.receiver_dirs[receiver_name]

        # Convert YYYYDDD to YYDDD format for directory name
        yyddd_str = yyyydoy.yydoy
        if yyddd_str is None:
            msg = f"Missing YYDDD representation for date {yyyydoy}"
            raise ValueError(msg)

        return self.base_dir / receiver_dir / yyddd_str

    def _get_all_dates(self) -> set[YYYYDOY]:
        """Find all dates that have data in any receiver directory.

        Returns
        -------
        set[YYYYDOY]
            Set of all dates with available data.

        """
        all_dates = set()

        for receiver_name in self.receivers:
            receiver_dir = self.receiver_dirs[receiver_name]
            receiver_base = self.base_dir / receiver_dir

            if not receiver_base.exists():
                continue

            # Find all date directories (format: YYDDD - 5 digits)
            for date_dir in receiver_base.iterdir():
                if not date_dir.is_dir():
                    continue

                # Check if directory name is 5 digits
                if len(date_dir.name) != DATE_DIR_LEN or not date_dir.name.isdigit():
                    continue

                # Skip placeholder directories
                if date_dir.name == "00000":
                    continue

                try:
                    yyyydoy = YYYYDOY.from_yydoy_str(date_dir.name)
                    all_dates.add(yyyydoy)
                except ValueError:
                    continue

        return all_dates

    def __iter__(self) -> Iterator[PairMatchedDirs]:
        """Iterate over all date/pair combinations with available data.

        Yields
        ------
        PairMatchedDirs
            Matched directories for a receiver pair on a specific date.

        """
        all_dates = sorted(self._get_all_dates())

        for yyyydoy in all_dates:
            # For each configured analysis pair
            for pair_name, pair_config in self.analysis_pairs.items():
                canopy_rx = pair_config["canopy_receiver"]
                reference_rx = pair_config["reference_receiver"]

                # Build paths for this pair
                canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
                reference_path = self._get_receiver_path(reference_rx, yyyydoy)

                # Check for RINEX files
                canopy_has_files = _has_rinex_files(canopy_path)
                reference_has_files = _has_rinex_files(reference_path)

                # Only yield if both directories exist and have data
                if canopy_has_files and reference_has_files:
                    yield PairMatchedDirs(
                        yyyydoy=yyyydoy,
                        pair_name=pair_name,
                        canopy_receiver=canopy_rx,
                        reference_receiver=reference_rx,
                        canopy_data_dir=canopy_path,
                        reference_data_dir=reference_path,
                    )

__init__(base_dir, receivers, analysis_pairs)

Initialize pair matcher with receiver configuration.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def __init__(
    self,
    base_dir: Path,
    receivers: dict[str, dict[str, str]],
    analysis_pairs: dict[str, dict[str, str]],
) -> None:
    """Initialize pair matcher with receiver configuration."""
    import warnings

    warnings.warn(
        "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.base_dir = Path(base_dir)
    self.receivers = receivers
    self.analysis_pairs = analysis_pairs

    # Validate receivers have directory config
    self.receiver_dirs = self._build_receiver_dir_mapping()

__iter__()

Iterate over all date/pair combinations with available data.

Yields

PairMatchedDirs Matched directories for a receiver pair on a specific date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
def __iter__(self) -> Iterator[PairMatchedDirs]:
    """Iterate over all date/pair combinations with available data.

    Yields
    ------
    PairMatchedDirs
        Matched directories for a receiver pair on a specific date.

    """
    all_dates = sorted(self._get_all_dates())

    for yyyydoy in all_dates:
        # For each configured analysis pair
        for pair_name, pair_config in self.analysis_pairs.items():
            canopy_rx = pair_config["canopy_receiver"]
            reference_rx = pair_config["reference_receiver"]

            # Build paths for this pair
            canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
            reference_path = self._get_receiver_path(reference_rx, yyyydoy)

            # Check for RINEX files
            canopy_has_files = _has_rinex_files(canopy_path)
            reference_has_files = _has_rinex_files(reference_path)

            # Only yield if both directories exist and have data
            if canopy_has_files and reference_has_files:
                yield PairMatchedDirs(
                    yyyydoy=yyyydoy,
                    pair_name=pair_name,
                    canopy_receiver=canopy_rx,
                    reference_receiver=reference_rx,
                    canopy_data_dir=canopy_path,
                    reference_data_dir=reference_path,
                )

MatchedDirs dataclass

Matched directory paths for canopy and reference receivers.

Immutable container representing a pair of directories containing RINEX data for the same date.

Parameters

canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference (open-sky) receiver RINEX directory. yyyydoy : YYYYDOY Date object for this matched pair.

Examples

from pathlib import Path from canvod.utils.tools import YYYYDOY

md = MatchedDirs( ... canopy_data_dir=Path("/data/02_canopy/25001"), ... reference_data_dir=Path("/data/01_reference/25001"), ... yyyydoy=YYYYDOY.from_str("2025001") ... ) md.yyyydoy.to_str() '2025001'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
@dataclass(frozen=True)
class MatchedDirs:
    """Matched directory paths for canopy and reference receivers.

    Immutable container representing a pair of directories containing
    RINEX data for the same date.

    Parameters
    ----------
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference (open-sky) receiver RINEX directory.
    yyyydoy : YYYYDOY
        Date object for this matched pair.

    Examples
    --------
    >>> from pathlib import Path
    >>> from canvod.utils.tools import YYYYDOY
    >>>
    >>> md = MatchedDirs(
    ...     canopy_data_dir=Path("/data/02_canopy/25001"),
    ...     reference_data_dir=Path("/data/01_reference/25001"),
    ...     yyyydoy=YYYYDOY.from_str("2025001")
    ... )
    >>> md.yyyydoy.to_str()
    '2025001'

    """

    canopy_data_dir: Path
    reference_data_dir: Path
    yyyydoy: YYYYDOY

PairMatchedDirs dataclass

Matched directories for a receiver pair on a specific date.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site.

Parameters

yyyydoy : YYYYDOY Date for this matched pair. pair_name : str Identifier for this receiver pair (e.g., "pair_01"). canopy_receiver : str Name of canopy receiver (e.g., "canopy_01"). reference_receiver : str Name of reference receiver (e.g., "reference_01"). canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference receiver RINEX directory.

Examples

pmd = PairMatchedDirs( ... yyyydoy=YYYYDOY.from_str("2025001"), ... pair_name="pair_01", ... canopy_receiver="canopy_01", ... reference_receiver="reference_01", ... canopy_data_dir=Path("/data/canopy_01/25001"), ... reference_data_dir=Path("/data/reference_01/25001") ... ) pmd.pair_name 'pair_01'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@dataclass
class PairMatchedDirs:
    """Matched directories for a receiver pair on a specific date.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site.

    Parameters
    ----------
    yyyydoy : YYYYDOY
        Date for this matched pair.
    pair_name : str
        Identifier for this receiver pair (e.g., "pair_01").
    canopy_receiver : str
        Name of canopy receiver (e.g., "canopy_01").
    reference_receiver : str
        Name of reference receiver (e.g., "reference_01").
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference receiver RINEX directory.

    Examples
    --------
    >>> pmd = PairMatchedDirs(
    ...     yyyydoy=YYYYDOY.from_str("2025001"),
    ...     pair_name="pair_01",
    ...     canopy_receiver="canopy_01",
    ...     reference_receiver="reference_01",
    ...     canopy_data_dir=Path("/data/canopy_01/25001"),
    ...     reference_data_dir=Path("/data/reference_01/25001")
    ... )
    >>> pmd.pair_name
    'pair_01'

    """

    yyyydoy: YYYYDOY
    pair_name: str
    canopy_receiver: str
    reference_receiver: str
    canopy_data_dir: Path
    reference_data_dir: Path

validate_dataset(ds, required_vars=None)

Validate ds meets the GNSSDataReader output contract.

Collects all violations and raises a single ValueError listing every problem, rather than stopping at the first failure.

Parameters

ds : xr.Dataset Dataset to validate. required_vars : list of str, optional Data variables that must be present. Defaults to :data:DEFAULT_REQUIRED_VARS (["SNR"]).

Raises

ValueError If any contract violation is found.

Source code in packages/canvod-readers/src/canvod/readers/base.py
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def validate_dataset(ds: xr.Dataset, required_vars: list[str] | None = None) -> None:
    """Validate *ds* meets the GNSSDataReader output contract.

    Collects **all** violations and raises a single ``ValueError`` listing
    every problem, rather than stopping at the first failure.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset to validate.
    required_vars : list of str, optional
        Data variables that must be present.  Defaults to
        :data:`DEFAULT_REQUIRED_VARS` (``["SNR"]``).

    Raises
    ------
    ValueError
        If any contract violation is found.
    """
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)

    errors: list[str] = []

    # -- dimensions --
    missing_dims = set(REQUIRED_DIMS) - set(ds.dims)
    if missing_dims:
        errors.append(f"Missing required dimensions: {missing_dims}")

    # -- coordinates --
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in ds.coords:
            errors.append(f"Missing required coordinate: {coord}")
            continue

        actual_dtype = str(ds[coord].dtype)
        if expected_dtype == "object":
            # Accept object (VariableLengthUTF8, stable Zarr V3) and numpy 2.0
            # StringDType (same stable type, different numpy representation).
            # Reject <U* (FixedLengthUTF32) — no stable Zarr V3 spec.
            is_valid_string = actual_dtype == "object" or actual_dtype.startswith(
                "StringDType"
            )
            if not is_valid_string:
                errors.append(
                    f"Coordinate {coord} has wrong dtype: "
                    f"expected string (object/StringDType), got {actual_dtype}"
                )
        elif expected_dtype not in actual_dtype:
            errors.append(
                f"Coordinate {coord} has wrong dtype: "
                f"expected {expected_dtype}, got {actual_dtype}"
            )

    # -- data variables --
    missing_vars = set(required_vars) - set(ds.data_vars)
    if missing_vars:
        errors.append(f"Missing required data variables: {missing_vars}")

    expected_var_dims = ("epoch", "sid")
    for var in ds.data_vars:
        if ds[var].dims != expected_var_dims:
            errors.append(
                f"Data variable {var} has wrong dimensions: "
                f"expected {expected_var_dims}, got {ds[var].dims}"
            )

    # -- attributes --
    missing_attrs = REQUIRED_ATTRS - set(ds.attrs.keys())
    if missing_attrs:
        errors.append(f"Missing required attributes: {missing_attrs}")

    if errors:
        raise ValueError(
            "Dataset validation failed:\n" + "\n".join(f"  - {e}" for e in errors)
        )

RINEX v3.04

RINEX v3.04 observation file reader.

Migrated from: gnssvodpy/rinexreader/rinex_reader.py

Changes from original: - Updated imports to use canvod.readers.gnss_specs - Added structured logging for LLM-friendly diagnostics - Removed IcechunkPreprocessor calls (TODO: move to canvod-store) - Preserved all other functionality

Classes: - Rnxv3Header: Parse RINEX v3 headers - Rnxv3Obs: Main reader class, converts RINEX to xarray Dataset

Rnxv3Header

Bases: BaseModel

Enhanced RINEX v3 header following the original implementation logic.

Key changes from previous version: - date field is now datetime (like original) - Uses the original parsing logic for __get_pgm_runby_date

Notes

This is a Pydantic BaseModel configured with ConfigDict (frozen, validate_assignment, arbitrary_types_allowed, str_strip_whitespace). Prefer :meth:from_file for construction.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
class Rnxv3Header(BaseModel):
    """Enhanced RINEX v3 header following the original implementation logic.

    Key changes from previous version:
    - date field is now datetime (like original)
    - Uses the original parsing logic for __get_pgm_runby_date

    Notes
    -----
    This is a Pydantic `BaseModel` configured with `ConfigDict` (frozen,
    validate_assignment, arbitrary_types_allowed, str_strip_whitespace). Prefer
    :meth:`from_file` for construction.

    """

    model_config = ConfigDict(
        frozen=True,
        validate_assignment=True,
        arbitrary_types_allowed=True,
        str_strip_whitespace=True,
    )

    # Required fields
    fpath: Path
    version: float
    filetype: str
    rinextype: str
    systems: str
    pgm: str
    run_by: str
    date: datetime
    marker_name: str
    observer: str
    agency: str
    receiver_number: str
    receiver_type: str
    receiver_version: str
    antenna_number: str
    antenna_type: str
    approx_position: list[pint.Quantity]
    antenna_position: list[pint.Quantity]
    t0: dict[str, datetime]
    signal_strength_unit: pint.Unit | str
    obs_codes_per_system: dict[str, list[str]]

    # Optional fields with defaults
    comment: str | None = None
    marker_number: int | None = None
    marker_type: str | None = None
    glonass_cod: str | None = None
    glonass_phs: str | None = None
    glonass_bis: str | None = None
    glonass_slot_freq_dict: dict[str, int] = Field(default_factory=dict)
    leap_seconds: pint.Quantity | None = None
    system_phase_shift: dict[str, dict[str, float | None]] = Field(default_factory=dict)

    @field_validator("marker_number", mode="before")
    @classmethod
    def parse_marker_number(cls, v: object) -> int | None:
        """Convert empty strings to None, parse valid integers."""
        if v is None or (isinstance(v, str) and not v.strip()):
            return None
        try:
            if not isinstance(v, (str, int, float)):
                return None
            return int(v)
        except ValueError, TypeError:
            return None

    @classmethod
    def from_file(cls, fpath: Path) -> Self:
        """Create header from a RINEX file."""
        # External validation models handle file and version checks
        _ = RnxObsFileModel(fpath=fpath)

        try:
            header = gr.rinexheader(fpath)
        except (OSError, ValueError, TypeError) as e:
            msg = f"Failed to read RINEX header: {e}"
            raise ValueError(msg) from e

        cast(Any, RnxVersion3Model).version_must_be_3(header["version"])

        # Parse and create instance using original logic
        parsed_data = cls._parse_header_data(cast(dict[str, Any], header), fpath)
        return cls.model_validate(parsed_data)

    @staticmethod
    def _parse_header_data(
        header: dict[str, Any],
        fpath: Path,
    ) -> dict[str, Any]:
        """Parse raw header into structured data using original logic.

        Parameters
        ----------
        header : dict[str, Any]
            Raw header dictionary returned by `georinex`.
        fpath : Path
            Path to the RINEX file.

        Returns
        -------
        dict[str, Any]
            Parsed header data suitable for model validation.

        """
        data = {
            "fpath": fpath,
            "version": header.get("version", 3.0),
            "filetype": header.get("filetype", ""),
            "rinextype": header.get("rinextype", ""),
            "systems": header.get("systems", ""),
        }

        if "PGM / RUN BY / DATE" in header:
            pgm, run_by, date_dt = Rnxv3Header._get_pgm_runby_date(header)
            data.update(
                {
                    "pgm": pgm,
                    "run_by": run_by,
                    "date": date_dt,  # This is now a datetime object
                }
            )
        else:
            data.update(
                {
                    "pgm": "",
                    "run_by": "",
                    "date": datetime.now(UTC),  # Default to current time
                }
            )

        if "OBSERVER / AGENCY" in header:
            observer, agency = Rnxv3Header._get_observer_agency(header)
            data.update({"observer": observer, "agency": agency})
        else:
            data.update({"observer": "", "agency": ""})

        if "REC # / TYPE / VERS" in header:
            rec_num, rec_type, rec_version = Rnxv3Header._get_receiver_num_type_version(
                header
            )
            data.update(
                {
                    "receiver_number": rec_num,
                    "receiver_type": rec_type,
                    "receiver_version": rec_version,
                }
            )
        else:
            data.update(
                {"receiver_number": "", "receiver_type": "", "receiver_version": ""}
            )

        if "ANT # / TYPE" in header:
            ant_num, ant_type = Rnxv3Header._get_antenna_num_type(header)
            data.update({"antenna_number": ant_num, "antenna_type": ant_type})
        else:
            data.update({"antenna_number": "", "antenna_type": ""})

        # Parse positions with safe fallbacks
        pos_parts = header.get("APPROX POSITION XYZ", "0 0 0").split()
        delta_parts = header.get("ANTENNA: DELTA H/E/N", "0 0 0").split()

        def safe_float(s: str, default: float = 0.0) -> float:
            try:
                return float(s)
            except ValueError, TypeError:
                return default

        pos_y = (
            safe_float(pos_parts[1]) * UREG.meters
            if len(pos_parts) > 1
            else 0.0 * UREG.meters
        )
        pos_z = (
            safe_float(pos_parts[2]) * UREG.meters
            if len(pos_parts) > POSITION_PARTS_MIN
            else 0.0 * UREG.meters
        )
        ant_y = (
            safe_float(delta_parts[1]) * UREG.meters
            if len(delta_parts) > 1
            else 0.0 * UREG.meters
        )
        ant_z = (
            safe_float(delta_parts[2]) * UREG.meters
            if len(delta_parts) > DELTA_PARTS_MIN
            else 0.0 * UREG.meters
        )

        data.update(
            {
                "approx_position": [
                    safe_float(pos_parts[0]) * UREG.meters,
                    pos_y,
                    pos_z,
                ],
                "antenna_position": [
                    safe_float(delta_parts[0]) * UREG.meters,
                    ant_y,
                    ant_z,
                ],
            }
        )

        if "TIME OF FIRST OBS" in header:
            data["t0"] = Rnxv3Header._get_time_of_first_obs(header)
        else:
            now = datetime.now(UTC)
            data["t0"] = {
                "UTC": now if now.tzinfo is not None else now.replace(tzinfo=UTC),
                "GPS": now,
            }

        # Signal strength unit
        data["signal_strength_unit"] = Rnxv3Header._get_signal_strength_unit(header)

        # Basic fields
        data.update(
            {
                "comment": header.get("COMMENT"),
                "marker_name": header.get("MARKER NAME", "").strip(),
                "marker_number": header.get("MARKER NUMBER"),
                "marker_type": header.get("MARKER TYPE"),
                "obs_codes_per_system": header.get("fields", {}),
            }
        )

        # Optional GLONASS fields using original methods
        if "GLONASS COD/PHS/BIS" in header:
            cod, phs, bis = Rnxv3Header._get_glonass_cod_phs_bis(header)
            data.update({"glonass_cod": cod, "glonass_phs": phs, "glonass_bis": bis})

        if "GLONASS SLOT / FRQ #" in header:
            data["glonass_slot_freq_dict"] = Rnxv3Header._get_glonass_slot_freq_num(
                header
            )

        # Leap seconds
        if "LEAP SECONDS" in header:
            leap_parts = header["LEAP SECONDS"].split()
            if leap_parts and leap_parts[0].lstrip("-").isdigit():
                data["leap_seconds"] = int(leap_parts[0]) * UREG.seconds

        # System phase shift using original method
        if "SYS / PHASE SHIFT" in header:
            data["system_phase_shift"] = Rnxv3Header._get_sys_phase_shift(header)
        else:
            data["system_phase_shift"] = {}

        return data

    @staticmethod
    def _get_pgm_runby_date(
        header_dict: dict[str, Any],
    ) -> tuple[str, str, datetime]:
        """Parse ``PGM / RUN BY / DATE`` into program, run_by, and datetime.

        Based on the original __get_pgm_runby_date method.
        """
        header_value = header_dict.get("PGM / RUN BY / DATE", "")
        components = header_value.split()

        if not components:
            return "", "", datetime.now(UTC)

        pgm = components[0]
        run_by = components[1] if len(components) > PGM_RUNBY_MIN_COMPONENTS else ""

        # Original logic for extracting date components
        date = (
            [components[-3], components[-2], components[-1]]
            if len(components) > 1
            else None
        )

        if date:
            try:
                # Original parsing logic
                dt = datetime.strptime(
                    date[0] + date[1],
                    "%Y%m%d%H%M%S",
                )
                tz = pytz.timezone(date[2])  # e.g., "UTC"
                localized_date = tz.localize(dt)
                return pgm, run_by, localized_date
            except (ValueError, TypeError) as e:
                print(f"Warning: Could not parse date components {date}: {e}")
                return pgm, run_by, datetime.now(UTC)
        else:
            return pgm, run_by, datetime.now(UTC)

    @staticmethod
    def _get_observer_agency(header_dict: dict[str, Any]) -> tuple[str, str]:
        """Parse ``OBSERVER / AGENCY`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str]
            (observer, agency).

        """
        header_value = header_dict.get("OBSERVER / AGENCY", "")
        try:
            observer, agency = header_value.split(maxsplit=1)
            return observer, agency
        except ValueError:
            return "", ""

    @staticmethod
    def _get_receiver_num_type_version(
        header_dict: dict[str, Any],
    ) -> tuple[str, str, str]:
        """Parse ``REC # / TYPE / VERS`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str, str]
            (receiver_number, receiver_type, receiver_version).

        """
        header_value = header_dict.get("REC # / TYPE / VERS", "")
        components = header_value.split()

        if not components:
            return "", "", ""
        if len(components) == 1:
            return components[0], "", ""
        if len(components) == RECEIVER_COMPONENTS_SECOND:
            return components[0], components[1], ""
        return components[0], " ".join(components[1:-1]), components[-1]

    @staticmethod
    def _get_antenna_num_type(header_dict: dict[str, Any]) -> tuple[str, str]:
        """Parse ``ANT # / TYPE`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str]
            (antenna_number, antenna_type).

        """
        header_value = header_dict.get("ANT # / TYPE", "")
        components = header_value.split()

        if not components:
            return "", ""
        if len(components) == 1:
            return components[0], ""
        return components[0], " ".join(components[1:])

    @staticmethod
    def _get_time_of_first_obs(
        header_dict: dict[str, Any],
    ) -> dict[str, datetime]:
        """Parse ``TIME OF FIRST OBS`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        dict[str, datetime]
            Mapping of time system labels to datetimes.

        """
        header_value = header_dict.get("TIME OF FIRST OBS", "")
        components = header_value.split()

        if len(components) < TIME_OF_FIRST_OBS_MIN_COMPONENTS:
            now = datetime.now(UTC)
            return {"UTC": now, "GPS": now}

        try:
            year, month, day = map(int, components[:3])
            hour, minute = map(int, components[3:5])
            second = float(components[5])

            dt_gps = datetime(
                year,
                month,
                day,
                hour,
                minute,
                int(second),
                int((second - int(second)) * 1e6),
                tzinfo=UTC,
            )

            gps_utc_offset = timedelta(seconds=18)
            dt_utc = dt_gps - gps_utc_offset
            tz = pytz.timezone("UTC")

            return {"UTC": tz.localize(dt_utc), "GPS": dt_gps}

        except ValueError, TypeError, IndexError:
            now = datetime.now(UTC)
            return {"UTC": now, "GPS": now}

    @staticmethod
    def _get_glonass_cod_phs_bis(
        header_dict: dict[str, Any],
    ) -> tuple[str, str, str]:
        """Parse ``GLONASS COD/PHS/BIS`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        tuple[str, str, str]
            (glonass_cod, glonass_phs, glonass_bis).

        """
        header_value = header_dict.get("GLONASS COD/PHS/BIS", "")
        components = header_value.split()

        if len(components) >= GLONASS_COD_PHS_MIN_COMPONENTS:
            c1c = f"{components[0]} {components[1]}"
            c2c = f"{components[2]} {components[3]}"
            c2p = f"{components[4]} {components[5]}"
            return c1c, c2c, c2p
        return "", "", ""

    @staticmethod
    def _get_glonass_slot_freq_num(
        header_dict: dict[str, Any],
    ) -> dict[str, int]:
        """Parse ``GLONASS SLOT / FRQ #`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        dict[str, int]
            Mapping of slot to frequency number.

        """
        header_value = header_dict.get("GLONASS SLOT / FRQ #", "")
        components = header_value.split()

        result = {}
        for i in range(1, len(components), 2):  # Skip first component
            if i + 1 < len(components):
                try:
                    slot = components[i]
                    freq_num = int(components[i + 1])
                    result[slot] = freq_num
                except ValueError, IndexError:
                    continue

        return result

    @staticmethod
    def _get_sys_phase_shift(
        header_dict: dict[str, Any],
    ) -> dict[str, dict[str, float | None]]:
        """Parse ``SYS / PHASE SHIFT`` records.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        dict[str, dict[str, float | None]]
            Mapping of system to signal phase shifts.

        """
        header_value = header_dict.get("SYS / PHASE SHIFT", "")
        components = header_value.split()

        sys_phase_shift_dict = defaultdict(dict)
        i = 0

        while i < len(components):
            if i >= len(components):
                break

            system_abbrv = components[i]

            if i + 1 >= len(components):
                break
            signal_code = components[i + 1]

            # Check if there's a phase shift value
            phase_shift = None
            if (
                i + 2 < len(components)
                and components[i + 2].replace(".", "", 1).replace("-", "", 1).isdigit()
            ):
                try:
                    phase_shift = float(components[i + 2])
                    i += 3
                except ValueError, TypeError:
                    i += 2
            else:
                i += 2

            sys_phase_shift_dict[system_abbrv][signal_code] = phase_shift

        return {k: dict(v) for k, v in sys_phase_shift_dict.items()}

    @staticmethod
    def _get_signal_strength_unit(
        header_dict: dict[str, Any],
    ) -> pint.Unit | str:
        """Parse ``SIGNAL STRENGTH UNIT`` record.

        Parameters
        ----------
        header_dict : dict[str, Any]
            Raw header dictionary.

        Returns
        -------
        pint.Unit or str
            Parsed unit or a default string.

        """
        header_value = header_dict.get("SIGNAL STRENGTH UNIT", "").strip()

        # Using match statement like original
        match header_value:
            case "DBHZ":
                return UREG.dBHz
            case "DB":
                return UREG.dB
            case _:
                return header_value if header_value else "dB"

    @property
    def is_mixed_systems(self) -> bool:
        """Check if the RINEX file contains mixed GNSS systems."""
        return self.systems == "M"

    def __repr__(self) -> str:
        """Return a concise representation for debugging."""
        return (
            f"Rnxv3Header(file='{self.fpath.name}', "
            f"version={self.version}, "
            f"systems='{self.systems}')"
        )

    def __str__(self) -> str:
        """Return a human-readable header summary."""
        systems_str = "Mixed" if self.systems == "M" else self.systems
        return (
            f"RINEX v{self.version} Header\n"
            f"  File: {self.fpath.name}\n"
            f"  Marker: {self.marker_name}\n"
            f"  Systems: {systems_str}\n"
            f"  Receiver: {self.receiver_type}\n"
            f"  Date: {self.date.strftime('%Y-%m-%d %H:%M:%S %Z')}\n"
        )

is_mixed_systems property

Check if the RINEX file contains mixed GNSS systems.

parse_marker_number(v) classmethod

Convert empty strings to None, parse valid integers.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
193
194
195
196
197
198
199
200
201
202
203
204
@field_validator("marker_number", mode="before")
@classmethod
def parse_marker_number(cls, v: object) -> int | None:
    """Convert empty strings to None, parse valid integers."""
    if v is None or (isinstance(v, str) and not v.strip()):
        return None
    try:
        if not isinstance(v, (str, int, float)):
            return None
        return int(v)
    except ValueError, TypeError:
        return None

from_file(fpath) classmethod

Create header from a RINEX file.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
@classmethod
def from_file(cls, fpath: Path) -> Self:
    """Create header from a RINEX file."""
    # External validation models handle file and version checks
    _ = RnxObsFileModel(fpath=fpath)

    try:
        header = gr.rinexheader(fpath)
    except (OSError, ValueError, TypeError) as e:
        msg = f"Failed to read RINEX header: {e}"
        raise ValueError(msg) from e

    cast(Any, RnxVersion3Model).version_must_be_3(header["version"])

    # Parse and create instance using original logic
    parsed_data = cls._parse_header_data(cast(dict[str, Any], header), fpath)
    return cls.model_validate(parsed_data)

__repr__()

Return a concise representation for debugging.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
698
699
700
701
702
703
704
def __repr__(self) -> str:
    """Return a concise representation for debugging."""
    return (
        f"Rnxv3Header(file='{self.fpath.name}', "
        f"version={self.version}, "
        f"systems='{self.systems}')"
    )

__str__()

Return a human-readable header summary.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
706
707
708
709
710
711
712
713
714
715
716
def __str__(self) -> str:
    """Return a human-readable header summary."""
    systems_str = "Mixed" if self.systems == "M" else self.systems
    return (
        f"RINEX v{self.version} Header\n"
        f"  File: {self.fpath.name}\n"
        f"  Marker: {self.marker_name}\n"
        f"  Systems: {systems_str}\n"
        f"  Receiver: {self.receiver_type}\n"
        f"  Date: {self.date.strftime('%Y-%m-%d %H:%M:%S %Z')}\n"
    )

Rnxv3Obs

Bases: GNSSDataReader

RINEX v3.04 observation reader.

Attributes

fpath : Path Path to the RINEX observation file. polarization : str, default "RHCP" Polarization label for observables. completeness_mode : {"strict", "warn", "off"}, default "strict" Behavior when epoch completeness checks fail. expected_dump_interval : str or pint.Quantity, optional Expected file dump interval for completeness validation. expected_sampling_interval : str or pint.Quantity, optional Expected sampling interval for completeness validation. apply_overlap_filter : bool, default False Whether to filter overlapping signal groups. overlap_preferences : dict[str, str], optional Preferred signals for overlap resolution. aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels.

Notes

Inherits fpath, its validator, and arbitrary_types_allowed from :class:GNSSDataReader.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
class Rnxv3Obs(GNSSDataReader):
    """RINEX v3.04 observation reader.

    Attributes
    ----------
    fpath : Path
        Path to the RINEX observation file.
    polarization : str, default "RHCP"
        Polarization label for observables.
    completeness_mode : {"strict", "warn", "off"}, default "strict"
        Behavior when epoch completeness checks fail.
    expected_dump_interval : str or pint.Quantity, optional
        Expected file dump interval for completeness validation.
    expected_sampling_interval : str or pint.Quantity, optional
        Expected sampling interval for completeness validation.
    apply_overlap_filter : bool, default False
        Whether to filter overlapping signal groups.
    overlap_preferences : dict[str, str], optional
        Preferred signals for overlap resolution.
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels.

    Notes
    -----
    Inherits ``fpath``, its validator, and ``arbitrary_types_allowed``
    from :class:`GNSSDataReader`.

    """

    model_config = ConfigDict(frozen=True)

    polarization: str = "RHCP"

    completeness_mode: Literal["strict", "warn", "off"] = "strict"
    expected_dump_interval: str | pint.Quantity | None = None
    expected_sampling_interval: str | pint.Quantity | None = None

    apply_overlap_filter: bool = False
    overlap_preferences: dict[str, str] | None = None

    aggregate_glonass_fdma: bool = True

    _header: Rnxv3Header = PrivateAttr()
    _signal_mapper: SignalIDMapper = PrivateAttr()

    _lines: list[str] = PrivateAttr()
    _file_hash: str = PrivateAttr()
    _cached_epoch_batches: list[tuple[int, int]] | None = PrivateAttr(default=None)

    @model_validator(mode="after")
    def _post_init(self) -> Self:
        """Initialize derived state after validation."""
        # Load header once
        self._header = Rnxv3Header.from_file(self.fpath)

        # Initialize signal mapper
        self._signal_mapper = SignalIDMapper(
            aggregate_glonass_fdma=self.aggregate_glonass_fdma
        )

        # Optionally auto-check completeness
        if self.completeness_mode != "off":
            try:
                self.validate_epoch_completeness(
                    dump_interval=self.expected_dump_interval,
                    sampling_interval=self.expected_sampling_interval,
                )
            except MissingEpochError as e:
                if self.completeness_mode == "strict":
                    raise
                warnings.warn(str(e), RuntimeWarning, stacklevel=2)

        # Cache file lines
        self._lines = self._load_file()

        return self

    @property
    def header(self) -> Rnxv3Header:
        """Expose validated header (read-only).

        Returns
        -------
        Rnxv3Header
            Parsed and validated RINEX header.

        """
        return self._header

    def __str__(self) -> str:
        """Return a human-readable summary."""
        return (
            f"{self.__class__.__name__}:\n"
            f"  File Path: {self.fpath}\n"
            f"  Header: {self.header}\n"
            f"  Polarization: {self.polarization}\n"
        )

    def __repr__(self) -> str:
        """Return a concise representation for debugging."""
        return f"{self.__class__.__name__}(fpath={self.fpath})"

    def _load_file(self) -> list[str]:
        """Read file once, cache lines, and compute hash.

        Returns
        -------
        list[str]
            File contents split into lines.

        """
        if not hasattr(self, "_lines"):
            h = hashlib.sha256()
            with self.fpath.open("rb") as f:  # binary mode for consistent hash
                data = f.read()
                h.update(data)
                self._lines = data.decode("utf-8", errors="replace").splitlines()
            self._file_hash = h.hexdigest()[:16]  # short hash for storage
        return self._lines

    @property
    def file_hash(self) -> str:
        """Return cached SHA256 short hash of the file content.

        Returns
        -------
        str
            16-character short hash for deduplication.

        """
        return self._file_hash

    @property
    def start_time(self) -> datetime:
        """Return start time of observations from header.

        Returns
        -------
        datetime
            First observation timestamp.

        """
        return min(self.header.t0.values())

    @property
    def end_time(self) -> datetime:
        """Return end time of observations from last epoch.

        Returns
        -------
        datetime
            Last observation timestamp.

        """
        last_epoch = None
        for epoch in self.iter_epochs():
            last_epoch = epoch
        if last_epoch:
            return self.get_datetime_from_epoch_record_info(last_epoch.info)
        return self.start_time

    @property
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers (G, R, E, C, J, S, I).

        """
        if self.header.systems == "M":
            return list(self.header.obs_codes_per_system.keys())
        return [self.header.systems]

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Returns
        -------
        int
            Total epoch count.

        """
        return len(list(self.get_epoch_record_batches()))

    @property
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.

        """
        satellites = set()
        for epoch in self.iter_epochs():
            for sat in epoch.data:
                satellites.add(sat.sv)
        return len(satellites)

    def get_epoch_record_batches(
        self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
    ) -> list[tuple[int, int]]:
        """Get the start and end line numbers for each epoch in the file.

        Parameters
        ----------
        epoch_record_indicator : str, default '>'
            Character marking epoch record lines.

        Returns
        -------
        list of tuple of int
            List of (start_line, end_line) pairs for each epoch.

        """
        if self._cached_epoch_batches is not None:
            return self._cached_epoch_batches

        lines = self._load_file()
        starts = [
            i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
        ]
        starts.append(len(lines))  # Add EOF
        self._cached_epoch_batches = [
            (start, starts[i + 1])
            for i, start in enumerate(starts)
            if i + 1 < len(starts)
        ]
        return self._cached_epoch_batches

    def parse_observation_slice(
        self,
        slice_text: str,
    ) -> tuple[float | None, int | None, int | None]:
        """Parse a RINEX observation slice into value, LLI, and SSI.

        Enhanced to handle both standard 16-character format and
        variable-length records.

        Parameters
        ----------
        slice_text : str
            Observation slice to parse.

        Returns
        -------
        tuple[float | None, int | None, int | None]
            Parsed (value, LLI, SSI) tuple.

        """
        if not slice_text or not slice_text.strip():
            return None, None, None

        try:
            # Method 1: Standard RINEX format with decimal at position -6
            if (
                len(slice_text) >= OBS_SLICE_MIN_LEN
                and len(slice_text) <= OBS_SLICE_MAX_LEN
                and slice_text[OBS_SLICE_DECIMAL_POS] == "."
            ):
                slice_chars = list(slice_text)
                ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
                lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

                # Convert LLI and SSI
                lli = int(lli) if lli.strip() and lli.isdigit() else None
                ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

                # Convert value
                value_str = "".join(slice_chars).strip()
                if value_str:
                    value = float(value_str)
                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        try:
            # Method 2: Flexible parsing for variable-length records
            slice_trimmed = slice_text.strip()
            if not slice_trimmed:
                return None, None, None

            # Look for a decimal point to identify the numeric value
            if "." in slice_trimmed:
                # Find the main numeric value (supports negative numbers)
                number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

                if number_match:
                    value = float(number_match.group(1))

                    # Check for LLI/SSI indicators after the number
                    remaining_part = slice_trimmed[number_match.end() :].strip()
                    lli = None
                    ssi = None

                    # Parse remaining characters as potential LLI/SSI
                    if remaining_part:
                        # Could be just SSI, or LLI followed by SSI
                        if len(remaining_part) == 1:
                            # Just one indicator - assume it's SSI
                            if remaining_part.isdigit():
                                ssi = int(remaining_part)
                        elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                            # Two or more characters - take last two as LLI, SSI
                            lli_char = remaining_part[-2]
                            ssi_char = remaining_part[-1]

                            if lli_char.isdigit():
                                lli = int(lli_char)
                            if ssi_char.isdigit():
                                ssi = int(ssi_char)

                    return value, lli, ssi

        except ValueError, IndexError:
            pass

        # Method 3: Last resort - try simple float parsing
        try:
            simple_value = float(slice_text.strip())
            return simple_value, None, None
        except ValueError:
            pass

        return None, None, None

    def process_satellite_data(self, s: str) -> Satellite:
        """Process satellite data line into a Satellite object with observations.

        Handles variable-length observation records correctly by adaptively parsing
        based on the actual line length and content.
        """
        sv = s[:3].strip()
        satellite = Satellite(sv=sv)
        bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

        # Get the data part (after sv identifier)
        data_part = s[3:]

        # Process each observation adaptively
        for i, band in enumerate(bands_tbe):
            start_idx = i * 16
            end_idx = start_idx + 16

            # Check if we have enough data for this observation
            if start_idx >= len(data_part):
                # No more data available - create empty observation
                observation = Observation(
                    obs_type=band.split("|")[1][0],
                    value=None,
                    lli=None,
                    ssi=None,
                )
                satellite.add_observation(observation)
                continue

            # Extract the slice, but handle variable length
            if end_idx <= len(data_part):
                # Full 16-character slice available
                slice_data = data_part[start_idx:end_idx]
            else:
                # Partial slice - pad with spaces to maintain consistency
                available_slice = data_part[start_idx:]
                slice_data = available_slice.ljust(16)  # Pad with spaces if needed

            value, lli, ssi = self.parse_observation_slice(slice_data)

            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=value,
                lli=lli,
                ssi=ssi,
            )
            satellite.add_observation(observation)

        return satellite

    @property
    def epochs(self) -> list[Rnxv3ObsEpochRecord]:
        """Materialize all epochs (legacy compatibility).

        Returns
        -------
        list of Rnxv3ObsEpochRecord
            All epochs in memory (use iter_epochs for efficiency)

        """
        return list(self.iter_epochs())

    def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
        """Yield epochs one by one instead of materializing the whole list.

        Returns
        -------
        Generator
            Generator yielding Rnxv3ObsEpochRecord objects

        Yields
        ------
        Rnxv3ObsEpochRecord
            Each epoch with timestamp and satellite observations

        """
        for start, end in self.get_epoch_record_batches():
            try:
                info = Rnxv3ObsEpochRecordLineModel.model_validate(
                    {"epoch": self._lines[start]}
                )

                # Skip event epochs (flag 2-6: special records, not observations)
                if info.epoch_flag > 1:
                    continue

                # Filter out blank/whitespace-only lines from data slice
                data = [line for line in self._lines[start + 1 : end] if line.strip()]
                epoch = Rnxv3ObsEpochRecord(
                    info=info,
                    data=[self.process_satellite_data(line) for line in data],
                )
                yield epoch
            except InvalidEpochError, IncompleteEpochError, ValueError:
                # Skip epochs with validation errors (invalid SV, malformed data,
                # pydantic ValidationError inherits from ValueError)
                pass

    def iter_epochs_in_range(
        self,
        start: datetime,
        end: datetime,
    ) -> Iterable[Rnxv3ObsEpochRecord]:
        """Yield epochs lazily that fall into the given datetime range.

        Parameters
        ----------
        start : datetime
            Start of time range (inclusive)
        end : datetime
            End of time range (inclusive)

        Returns
        -------
        Generator
            Generator yielding epochs in the specified range

        Yields
        ------
        Rnxv3ObsEpochRecord
            Epochs within the time range

        """
        for epoch in self.iter_epochs():
            dt = self.get_datetime_from_epoch_record_info(epoch.info)
            if start <= dt <= end:
                yield epoch

    def get_datetime_from_epoch_record_info(
        self,
        epoch_record_info: Rnxv3ObsEpochRecordLineModel,
    ) -> datetime:
        """Convert epoch record info to datetime object.

        Parameters
        ----------
        epoch_record_info : Rnxv3ObsEpochRecordLineModel
            Parsed epoch record line

        Returns
        -------
        datetime
            Timestamp from epoch record

        """
        return datetime(
            year=int(epoch_record_info.year),
            month=int(epoch_record_info.month),
            day=int(epoch_record_info.day),
            hour=int(epoch_record_info.hour),
            minute=int(epoch_record_info.minute),
            second=int(epoch_record_info.seconds),
            tzinfo=UTC,
        )

    @staticmethod
    def epochrecordinfo_dt_to_numpy_dt(
        epch: Rnxv3ObsEpochRecord,
    ) -> np.datetime64:
        """Convert Python datetime to numpy datetime64[ns].

        Parameters
        ----------
        epch : Rnxv3ObsEpochRecord
            Epoch record containing timestamp info

        Returns
        -------
        np.datetime64
            Numpy datetime64 with nanosecond precision

        """
        dt = datetime(
            year=int(epch.info.year),
            month=int(epch.info.month),
            day=int(epch.info.day),
            hour=int(epch.info.hour),
            minute=int(epch.info.minute),
            second=int(epch.info.seconds),
            tzinfo=UTC,
        )
        # np.datetime64 doesn't support timezone info, but datetime is already UTC
        # Convert to naive datetime (UTC) to avoid warning
        return np.datetime64(dt.replace(tzinfo=None), "ns")

    def _epoch_datetimes(self) -> list[datetime]:
        """Extract epoch datetimes from the file.

        Uses the same epoch parsing logic already implemented.
        """
        dts: list[datetime] = []

        for start, _end in self.get_epoch_record_batches():
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )
            dts.append(
                datetime(
                    year=int(info.year),
                    month=int(info.month),
                    day=int(info.day),
                    hour=int(info.hour),
                    minute=int(info.minute),
                    second=int(info.seconds),
                    tzinfo=UTC,
                )
            )
        return dts

    def infer_sampling_interval(self) -> pint.Quantity | None:
        """Infer sampling interval from consecutive epoch deltas.

        Returns
        -------
        pint.Quantity or None
            Sampling interval in seconds, or None if cannot be inferred

        """
        dts = self._epoch_datetimes()
        if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
            return None
        # Compute deltas
        deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
        if not deltas:
            return None
        # Pick the most common delta (robust to an occasional missing epoch)
        seconds = Counter(
            int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
        )
        if not seconds:
            return None
        mode_seconds, _ = seconds.most_common(1)[0]
        return (mode_seconds * UREG.second).to(UREG.seconds)

    def infer_dump_interval(
        self, sampling_interval: pint.Quantity | None = None
    ) -> pint.Quantity | None:
        """Infer the intended dump interval for the RINEX file.

        Parameters
        ----------
        sampling_interval : pint.Quantity, optional
            Known sampling interval. If provided, returns (#epochs * sampling_interval)

        Returns
        -------
        pint.Quantity or None
            Dump interval in seconds, or None if cannot be inferred

        """
        idx = self.get_epoch_record_batches()
        n_epochs = len(idx)
        if n_epochs == 0:
            return None

        if sampling_interval is not None:
            return (n_epochs * sampling_interval).to(UREG.seconds)

        # Fallback: time coverage inclusive (last - first) + typical step
        dts = self._epoch_datetimes()
        if len(dts) == 0:
            return None
        if len(dts) == 1:
            # single epoch: treat as 1 * unknown step (cannot infer)
            return None

        # Estimate step from data
        est_step = self.infer_sampling_interval()
        if est_step is None:
            return None

        # Inclusive coverage often equals (n_epochs - 1) * step; intended
        # dump interval is n_epochs * step.
        return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

    def validate_epoch_completeness(
        self,
        dump_interval: str | pint.Quantity | None = None,
        sampling_interval: str | pint.Quantity | None = None,
    ) -> None:
        """Validate that the number of epochs matches the expected dump interval.

        Parameters
        ----------
        dump_interval : str or pint.Quantity, optional
            Expected file dump interval. If None, inferred from epochs.
        sampling_interval : str or pint.Quantity, optional
            Expected sampling interval. If None, inferred from epochs.

        Returns
        -------
        None

        Raises
        ------
        MissingEpochError
            If total sampling time doesn't match dump interval
        ValueError
            If intervals cannot be inferred

        """
        # Normalize/Infer sampling interval
        if sampling_interval is None:
            inferred = self.infer_sampling_interval()
            if inferred is None:
                msg = "Could not infer sampling interval from epochs"
                raise ValueError(msg)
            sampling_interval = inferred
        # normalize to pint
        elif not isinstance(sampling_interval, pint.Quantity):
            sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

        # Normalize/Infer dump interval
        if dump_interval is None:
            inferred_dump = self.infer_dump_interval(
                sampling_interval=sampling_interval
            )
            if inferred_dump is None:
                msg = "Could not infer dump interval from file"
                raise ValueError(msg)
            dump_interval = inferred_dump
        elif not isinstance(dump_interval, pint.Quantity):
            # Accept '15 min', '1h', etc.
            dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

        # Build inputs for the validator model
        epoch_indices = self.get_epoch_record_batches()

        # This throws MissingEpochError automatically if inconsistent
        cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
            epoch_records_indeces=epoch_indices,
            rnx_file_dump_interval=dump_interval,
            sampling_interval=sampling_interval,
        )

    def filter_by_overlapping_groups(
        self,
        ds: xr.Dataset,
        group_preference: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Filter overlapping bands using per-group preferences.

        Parameters
        ----------
        ds : xr.Dataset
            Dataset with `sid` dimension and signal properties.
        group_preference : dict[str, str], optional
            Mapping of overlap group to preferred band.

        Returns
        -------
        xr.Dataset
            Dataset filtered to preferred overlapping bands.

        """
        if group_preference is None:
            group_preference = {
                "L1_E1_B1I": "L1",
                "L5_E5a": "L5",
                "L2_E5b_B2b": "L2",
            }

        keep = []
        for sid in ds.sid.values:
            parts = str(sid).split("|")
            band = parts[1] if len(parts) >= 2 else ""
            group = self._signal_mapper.get_overlapping_group(band)
            if group and group in group_preference:
                if band == group_preference[group]:
                    keep.append(sid)
            else:
                keep.append(sid)
        return ds.sel(sid=keep)

    def _precompute_sids_from_header(
        self,
    ) -> tuple[list[str], dict[str, dict[str, object]]]:
        """Build sorted SID list and properties from header info alone.

        Uses the header's obs_codes_per_system and static constellation
        SV lists to pre-compute the full theoretical SID set, eliminating
        the discovery pass.

        Returns
        -------
        sorted_sids : list[str]
            Sorted list of signal IDs.
        sid_properties : dict[str, dict[str, object]]
            Mapping of SID to its properties (sv, system, band, code,
            freq_center, freq_min, freq_max, bandwidth, overlapping_group).

        """
        mapper = self._signal_mapper
        signal_ids: set[str] = set()
        sid_properties: dict[str, dict[str, object]] = {}

        # Pre-compute pint arithmetic once per unique band
        band_freq_cache: dict[str, tuple[float, float, float, float]] = {}

        for system, obs_codes in self.header.obs_codes_per_system.items():
            svs = _get_constellation_svs(system)

            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]

                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )

                # Cache frequency arithmetic per band
                if band_name not in band_freq_cache:
                    center_frequency = mapper.get_band_frequency(band_name)
                    bandwidth = mapper.get_band_bandwidth(band_name)

                    if center_frequency is not None and bandwidth is not None:
                        bw = bandwidth[0] if isinstance(bandwidth, list) else bandwidth
                        freq_min = center_frequency - (bw / 2.0)
                        freq_max = center_frequency + (bw / 2.0)
                        band_freq_cache[band_name] = (
                            float(center_frequency),
                            float(freq_min),
                            float(freq_max),
                            float(bw),
                        )
                    else:
                        band_freq_cache[band_name] = (
                            np.nan,
                            np.nan,
                            np.nan,
                            np.nan,
                        )

                freq_center, freq_min, freq_max, bw = band_freq_cache[band_name]
                overlapping_group = mapper.get_overlapping_group(band_name)

                sid_suffix = "|" + band_name + "|" + code_char

                for sv in svs:
                    sid = sv + sid_suffix
                    if sid not in signal_ids:
                        signal_ids.add(sid)
                        sid_properties[sid] = {
                            "sv": sv,
                            "system": system,
                            "band": band_name,
                            "code": code_char,
                            "freq_center": freq_center,
                            "freq_min": freq_min,
                            "freq_max": freq_max,
                            "bandwidth": bw,
                            "overlapping_group": overlapping_group,
                        }

        sorted_sids = sorted(signal_ids)
        return sorted_sids, {s: sid_properties[s] for s in sorted_sids}

    def _create_dataset_single_pass(self) -> xr.Dataset:
        """Create xarray Dataset in a single pass over the file.

        Pre-allocates arrays using header-derived SID set and epoch count,
        then fills them by parsing observations inline without Pydantic
        models or function-call overhead.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and standard variables.

        """
        lines = self._load_file()
        epoch_batches = self.get_epoch_record_batches()
        n_epochs = len(epoch_batches)

        sorted_sids, sid_properties = self._precompute_sids_from_header()
        n_sids = len(sorted_sids)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}

        # Pre-allocate arrays
        timestamps = np.empty(n_epochs, dtype="datetime64[ns]")
        snr = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["SNR"])
        pseudo = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Pseudorange"])
        phase = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Phase"])
        doppler = np.full((n_epochs, n_sids), np.nan, dtype=DTYPES["Doppler"])
        lli = np.full((n_epochs, n_sids), -1, dtype=DTYPES["LLI"])
        ssi = np.full((n_epochs, n_sids), -1, dtype=DTYPES["SSI"])

        # Build obs_code → (obs_type, sid_suffix) lookup per system
        mapper = self._signal_mapper
        system_obs_lut: dict[str, list[tuple[str, str]]] = {}
        for system, obs_codes in self.header.obs_codes_per_system.items():
            lut: list[tuple[str, str]] = []
            for obs_code in obs_codes:
                if len(obs_code) < 3:
                    lut.append(("", ""))
                    continue
                band_num = obs_code[1]
                code_char = obs_code[2]
                band_name = mapper.SYSTEM_BANDS.get(system, {}).get(
                    band_num, f"UnknownBand{band_num}"
                )
                obs_type = obs_code[0]
                lut.append((obs_type, "|" + band_name + "|" + code_char))
            system_obs_lut[system] = lut

        # Single pass over all epochs — skip unparseable epoch lines
        valid_mask = np.ones(n_epochs, dtype=bool)
        for t_idx, (start, end) in enumerate(epoch_batches):
            epoch_line = lines[start]

            # Inline epoch parsing (no Pydantic model)
            m = _EPOCH_RE.match(epoch_line)
            if m is None:
                valid_mask[t_idx] = False
                continue

            year, month, day = int(m[1]), int(m[2]), int(m[3])
            hour, minute = int(m[4]), int(m[5])
            seconds = float(m[6])
            sec_int = int(seconds)
            usec = int((seconds - sec_int) * 1_000_000)
            ts = np.datetime64(
                f"{year:04d}-{month:02d}-{day:02d}"
                f"T{hour:02d}:{minute:02d}:{sec_int:02d}",
                "ns",
            )
            ts += np.timedelta64(usec, "us")
            timestamps[t_idx] = ts

            # Parse satellite data lines inline
            for line_idx in range(start + 1, end):
                sat_line = lines[line_idx]
                if len(sat_line) < 3:
                    continue
                sv = sat_line[:3].strip()
                if not sv:
                    continue
                system = sv[0]
                lut_list = system_obs_lut.get(system)
                if lut_list is None:
                    continue

                data_part = sat_line[3:]
                data_part_len = len(data_part)

                for i, (obs_type, sid_suffix) in enumerate(lut_list):
                    if not obs_type:
                        continue

                    col_start = i * 16
                    if col_start >= data_part_len:
                        break

                    sid_key = sv + sid_suffix
                    s_idx = sid_to_idx.get(sid_key)
                    if s_idx is None:
                        continue

                    col_end = col_start + 16
                    slice_text = data_part[col_start:col_end]

                    value, obs_lli, obs_ssi = _parse_obs_fast(slice_text)
                    if value is None:
                        continue

                    if obs_type == "S":
                        if value != 0:
                            snr[t_idx, s_idx] = value
                    elif obs_type == "C":
                        pseudo[t_idx, s_idx] = value
                    elif obs_type == "L":
                        phase[t_idx, s_idx] = value
                    elif obs_type == "D":
                        doppler[t_idx, s_idx] = value

                    if obs_lli is not None:
                        lli[t_idx, s_idx] = obs_lli
                    if obs_ssi is not None:
                        ssi[t_idx, s_idx] = obs_ssi

        # Drop epochs that failed to parse
        if not valid_mask.all():
            timestamps = timestamps[valid_mask]
            snr = snr[valid_mask]
            pseudo = pseudo[valid_mask]
            phase = phase[valid_mask]
            doppler = doppler[valid_mask]
            lli = lli[valid_mask]
            ssi = ssi[valid_mask]

        # Build coordinate arrays from pre-computed properties
        sv_list = np.array(
            [sid_properties[sid]["sv"] for sid in sorted_sids], dtype=object
        )
        constellation_list = np.array(
            [sid_properties[sid]["system"] for sid in sorted_sids], dtype=object
        )
        band_list = np.array(
            [sid_properties[sid]["band"] for sid in sorted_sids], dtype=object
        )
        code_list = np.array(
            [sid_properties[sid]["code"] for sid in sorted_sids], dtype=object
        )
        freq_center_list = [sid_properties[sid]["freq_center"] for sid in sorted_sids]
        freq_min_list = [sid_properties[sid]["freq_min"] for sid in sorted_sids]
        freq_max_list = [sid_properties[sid]["freq_max"] for sid in sorted_sids]

        signal_id_coord = xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        )
        coords = {
            "epoch": ("epoch", timestamps, COORDS_METADATA["epoch"]),
            "sid": signal_id_coord,
            "sv": ("sid", sv_list, COORDS_METADATA["sv"]),
            "system": ("sid", constellation_list, COORDS_METADATA["system"]),
            "band": ("sid", band_list, COORDS_METADATA["band"]),
            "code": ("sid", code_list, COORDS_METADATA["code"]),
            "freq_center": (
                "sid",
                np.asarray(freq_center_list, dtype=DTYPES["freq_center"]),
                COORDS_METADATA["freq_center"],
            ),
            "freq_min": (
                "sid",
                np.asarray(freq_min_list, dtype=DTYPES["freq_min"]),
                COORDS_METADATA["freq_min"],
            ),
            "freq_max": (
                "sid",
                np.asarray(freq_max_list, dtype=DTYPES["freq_max"]),
                COORDS_METADATA["freq_max"],
            ),
        }

        if self.header.signal_strength_unit == UREG.dBHz:
            snr_meta = CN0_METADATA
        else:
            snr_meta = SNR_METADATA

        ds = xr.Dataset(
            data_vars={
                "SNR": (["epoch", "sid"], snr, snr_meta),
                "Pseudorange": (
                    ["epoch", "sid"],
                    pseudo,
                    OBSERVABLES_METADATA["Pseudorange"],
                ),
                "Phase": (
                    ["epoch", "sid"],
                    phase,
                    OBSERVABLES_METADATA["Phase"],
                ),
                "Doppler": (
                    ["epoch", "sid"],
                    doppler,
                    OBSERVABLES_METADATA["Doppler"],
                ),
                "LLI": (
                    ["epoch", "sid"],
                    lli,
                    OBSERVABLES_METADATA["LLI"],
                ),
                "SSI": (
                    ["epoch", "sid"],
                    ssi,
                    OBSERVABLES_METADATA["SSI"],
                ),
            },
            coords=coords,
            attrs={**self._build_attrs()},
        )

        if self.apply_overlap_filter:
            ds = self.filter_by_overlapping_groups(ds, self.overlap_preferences)

        return ds

    def create_rinex_netcdf_with_signal_id(
        self,
        start: datetime | None = None,
        end: datetime | None = None,
    ) -> xr.Dataset:
        """Create a NetCDF dataset with signal IDs.

        Always uses the fast single-pass path.  Optionally restricts to
        epochs within a datetime range via post-filtering.

        Parameters
        ----------
        start : datetime, optional
            Start of time range (inclusive).
        end : datetime, optional
            End of time range (inclusive).

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid).

        """
        ds = self._create_dataset_single_pass()

        if start or end:
            ds = ds.sel(epoch=slice(start, end))

        return ds

    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert RINEX observations to xarray.Dataset with signal ID structure.

        Parameters
        ----------
        outname : Path or str, optional
            If provided, saves dataset to this file path
        keep_data_vars : list of str or None, optional
            Data variables to include in dataset. Defaults to config value.
        write_global_attrs : bool, default False
            If True, adds comprehensive global attributes
        pad_global_sid : bool, default True
            If True, pads to global signal ID space
        strip_fillval : bool, default True
            If True, removes fill values
        add_future_datavars : bool, default True
            If True, adds placeholder variables for future data
        keep_sids : list of str or None, default None
            If provided, filters/pads dataset to these specific SIDs.
            If None and pad_global_sid=True, pads to all possible SIDs.

        Returns
        -------
        xr.Dataset
            Dataset with dimensions (epoch, sid) and requested data variables

        """
        outname = cast(Path | str | None, kwargs.pop("outname", None))
        write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
        pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
        strip_fillval = bool(kwargs.pop("strip_fillval", True))
        add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
        keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

        if keep_data_vars is None:
            from canvod.utils.config import load_config

            keep_data_vars = load_config().processing.processing.keep_rnx_vars

        ds = self.create_rinex_netcdf_with_signal_id()

        # drop unwanted vars
        for var in list(ds.data_vars):
            if var not in keep_data_vars:
                ds = ds.drop_vars([var])

        if pad_global_sid:
            from canvod.auxiliary.preprocessing import pad_to_global_sid

            # Pad/filter to specified sids or all possible sids
            ds = pad_to_global_sid(ds, keep_sids=keep_sids)

        if strip_fillval:
            from canvod.auxiliary.preprocessing import strip_fillvalue

            ds = strip_fillvalue(ds)

        if add_future_datavars:
            pass

        if write_global_attrs:
            ds.attrs.update(self._create_comprehensive_attrs())

        ds.attrs.update(self._build_attrs())

        if outname:
            from canvod.utils.config import load_config as _load_config

            comp = _load_config().processing.compression
            encoding = {
                var: {"zlib": comp.zlib, "complevel": comp.complevel}
                for var in ds.data_vars
            }
            ds.to_netcdf(str(outname), encoding=encoding)

        # Validate output structure for pipeline compatibility
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

    def validate_rinex_304_compliance(
        self,
        ds: xr.Dataset | None = None,
        strict: bool = False,
        print_report: bool = True,
    ) -> dict[str, list[str]]:
        """Run enhanced RINEX 3.04 specification validation.

        Validates:
        1. System-specific observation codes
        2. GLONASS mandatory fields (slot/frequency, biases)
        3. Phase shift records (RINEX 3.01+)
        4. Observation value ranges

        Parameters
        ----------
        ds : xr.Dataset, optional
            Dataset to validate. If None, creates one from current file.
        strict : bool
            If True, raise ValueError on validation failures
        print_report : bool
            If True, print validation report to console

        Returns
        -------
        dict[str, list[str]]
            Validation results by category

        Examples
        --------
        >>> reader = Rnxv3Obs(fpath="station.24o")
        >>> results = reader.validate_rinex_304_compliance()
        >>> # Or validate a specific dataset
        >>> ds = reader.to_ds()
        >>> results = reader.validate_rinex_304_compliance(ds=ds)

        """
        if ds is None:
            ds = self.to_ds(write_global_attrs=False)

        # Prepare header dict for validators
        header_dict: dict[str, Any] = {
            "obs_codes_per_system": self.header.obs_codes_per_system,
        }

        # Add GLONASS-specific headers if available
        if hasattr(self.header, "glonass_slot_frq"):
            header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

        if hasattr(self.header, "glonass_cod_phs_bis"):
            header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

        if hasattr(self.header, "phase_shift"):
            header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

        # Run validation
        results = RINEX304ComplianceValidator.validate_all(
            ds=ds, header_dict=header_dict, strict=strict
        )

        if print_report:
            RINEX304ComplianceValidator.print_validation_report(results)

        return results

    def _create_comprehensive_attrs(self) -> dict[str, object]:
        attrs: dict[str, object] = {
            "File Path": str(self.fpath),
            "File Type": self.header.filetype,
            "RINEX Version": self.header.version,
            "RINEX Type": self.header.rinextype,
            "Observer": self.header.observer,
            "Agency": self.header.agency,
            "Date": self.header.date.isoformat(),
            "Marker Name": self.header.marker_name,
            "Marker Number": self.header.marker_number,
            "Marker Type": self.header.marker_type,
            "Approximate Position": (
                f"(X = {self.header.approx_position[0].magnitude} "
                f"{self.header.approx_position[0].units:~}, "
                f"Y = {self.header.approx_position[1].magnitude} "
                f"{self.header.approx_position[1].units:~}, "
                f"Z = {self.header.approx_position[2].magnitude} "
                f"{self.header.approx_position[2].units:~})"
            ),
            "Receiver Type": self.header.receiver_type,
            "Receiver Version": self.header.receiver_version,
            "Receiver Number": self.header.receiver_number,
            "Antenna Type": self.header.antenna_type,
            "Antenna Number": self.header.antenna_number,
            "Antenna Position": (
                f"(X = {self.header.antenna_position[0].magnitude} "
                f"{self.header.antenna_position[0].units:~}, "
                f"Y = {self.header.antenna_position[1].magnitude} "
                f"{self.header.antenna_position[1].units:~}, "
                f"Z = {self.header.antenna_position[2].magnitude} "
                f"{self.header.antenna_position[2].units:~})"
            ),
            "Program": self.header.pgm,
            "Run By": self.header.run_by,
            "Time of First Observation": json.dumps(
                {k: v.isoformat() for k, v in self.header.t0.items()}
            ),
            "GLONASS COD": self.header.glonass_cod,
            "GLONASS PHS": self.header.glonass_phs,
            "GLONASS BIS": self.header.glonass_bis,
            "GLONASS Slot Frequency Dict": json.dumps(
                self.header.glonass_slot_freq_dict
            ),
            "Leap Seconds": f"{self.header.leap_seconds:~}",
        }
        return attrs

header property

Expose validated header (read-only).

Returns

Rnxv3Header Parsed and validated RINEX header.

file_hash property

Return cached SHA256 short hash of the file content.

Returns

str 16-character short hash for deduplication.

start_time property

Return start time of observations from header.

Returns

datetime First observation timestamp.

end_time property

Return end time of observations from last epoch.

Returns

datetime Last observation timestamp.

systems property

Return list of GNSS systems in file.

Returns

list of str System identifiers (G, R, E, C, J, S, I).

num_epochs property

Return number of epochs in file.

Returns

int Total epoch count.

num_satellites property

Return total number of unique satellites observed.

Returns

int Count of unique satellite vehicles across all systems.

epochs property

Materialize all epochs (legacy compatibility).

Returns

list of Rnxv3ObsEpochRecord All epochs in memory (use iter_epochs for efficiency)

__str__()

Return a human-readable summary.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
808
809
810
811
812
813
814
815
def __str__(self) -> str:
    """Return a human-readable summary."""
    return (
        f"{self.__class__.__name__}:\n"
        f"  File Path: {self.fpath}\n"
        f"  Header: {self.header}\n"
        f"  Polarization: {self.polarization}\n"
    )

__repr__()

Return a concise representation for debugging.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
817
818
819
def __repr__(self) -> str:
    """Return a concise representation for debugging."""
    return f"{self.__class__.__name__}(fpath={self.fpath})"

get_epoch_record_batches(epoch_record_indicator=EPOCH_RECORD_INDICATOR)

Get the start and end line numbers for each epoch in the file.

Parameters

epoch_record_indicator : str, default '>' Character marking epoch record lines.

Returns

list of tuple of int List of (start_line, end_line) pairs for each epoch.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
def get_epoch_record_batches(
    self, epoch_record_indicator: str = EPOCH_RECORD_INDICATOR
) -> list[tuple[int, int]]:
    """Get the start and end line numbers for each epoch in the file.

    Parameters
    ----------
    epoch_record_indicator : str, default '>'
        Character marking epoch record lines.

    Returns
    -------
    list of tuple of int
        List of (start_line, end_line) pairs for each epoch.

    """
    if self._cached_epoch_batches is not None:
        return self._cached_epoch_batches

    lines = self._load_file()
    starts = [
        i for i, line in enumerate(lines) if line.startswith(epoch_record_indicator)
    ]
    starts.append(len(lines))  # Add EOF
    self._cached_epoch_batches = [
        (start, starts[i + 1])
        for i, start in enumerate(starts)
        if i + 1 < len(starts)
    ]
    return self._cached_epoch_batches

parse_observation_slice(slice_text)

Parse a RINEX observation slice into value, LLI, and SSI.

Enhanced to handle both standard 16-character format and variable-length records.

Parameters

slice_text : str Observation slice to parse.

Returns

tuple[float | None, int | None, int | None] Parsed (value, LLI, SSI) tuple.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
def parse_observation_slice(
    self,
    slice_text: str,
) -> tuple[float | None, int | None, int | None]:
    """Parse a RINEX observation slice into value, LLI, and SSI.

    Enhanced to handle both standard 16-character format and
    variable-length records.

    Parameters
    ----------
    slice_text : str
        Observation slice to parse.

    Returns
    -------
    tuple[float | None, int | None, int | None]
        Parsed (value, LLI, SSI) tuple.

    """
    if not slice_text or not slice_text.strip():
        return None, None, None

    try:
        # Method 1: Standard RINEX format with decimal at position -6
        if (
            len(slice_text) >= OBS_SLICE_MIN_LEN
            and len(slice_text) <= OBS_SLICE_MAX_LEN
            and slice_text[OBS_SLICE_DECIMAL_POS] == "."
        ):
            slice_chars = list(slice_text)
            ssi = slice_chars.pop(-1) if len(slice_chars) > 0 else ""
            lli = slice_chars.pop(-1) if len(slice_chars) > 0 else ""

            # Convert LLI and SSI
            lli = int(lli) if lli.strip() and lli.isdigit() else None
            ssi = int(ssi) if ssi.strip() and ssi.isdigit() else None

            # Convert value
            value_str = "".join(slice_chars).strip()
            if value_str:
                value = float(value_str)
                return value, lli, ssi

    except ValueError, IndexError:
        pass

    try:
        # Method 2: Flexible parsing for variable-length records
        slice_trimmed = slice_text.strip()
        if not slice_trimmed:
            return None, None, None

        # Look for a decimal point to identify the numeric value
        if "." in slice_trimmed:
            # Find the main numeric value (supports negative numbers)
            number_match = re.search(r"(-?\d+\.\d+)", slice_trimmed)

            if number_match:
                value = float(number_match.group(1))

                # Check for LLI/SSI indicators after the number
                remaining_part = slice_trimmed[number_match.end() :].strip()
                lli = None
                ssi = None

                # Parse remaining characters as potential LLI/SSI
                if remaining_part:
                    # Could be just SSI, or LLI followed by SSI
                    if len(remaining_part) == 1:
                        # Just one indicator - assume it's SSI
                        if remaining_part.isdigit():
                            ssi = int(remaining_part)
                    elif len(remaining_part) >= LLI_SSI_PAIR_LEN:
                        # Two or more characters - take last two as LLI, SSI
                        lli_char = remaining_part[-2]
                        ssi_char = remaining_part[-1]

                        if lli_char.isdigit():
                            lli = int(lli_char)
                        if ssi_char.isdigit():
                            ssi = int(ssi_char)

                return value, lli, ssi

    except ValueError, IndexError:
        pass

    # Method 3: Last resort - try simple float parsing
    try:
        simple_value = float(slice_text.strip())
        return simple_value, None, None
    except ValueError:
        pass

    return None, None, None

process_satellite_data(s)

Process satellite data line into a Satellite object with observations.

Handles variable-length observation records correctly by adaptively parsing based on the actual line length and content.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
def process_satellite_data(self, s: str) -> Satellite:
    """Process satellite data line into a Satellite object with observations.

    Handles variable-length observation records correctly by adaptively parsing
    based on the actual line length and content.
    """
    sv = s[:3].strip()
    satellite = Satellite(sv=sv)
    bands_tbe = [f"{sv}|{b}" for b in self.header.obs_codes_per_system[sv[0]]]

    # Get the data part (after sv identifier)
    data_part = s[3:]

    # Process each observation adaptively
    for i, band in enumerate(bands_tbe):
        start_idx = i * 16
        end_idx = start_idx + 16

        # Check if we have enough data for this observation
        if start_idx >= len(data_part):
            # No more data available - create empty observation
            observation = Observation(
                obs_type=band.split("|")[1][0],
                value=None,
                lli=None,
                ssi=None,
            )
            satellite.add_observation(observation)
            continue

        # Extract the slice, but handle variable length
        if end_idx <= len(data_part):
            # Full 16-character slice available
            slice_data = data_part[start_idx:end_idx]
        else:
            # Partial slice - pad with spaces to maintain consistency
            available_slice = data_part[start_idx:]
            slice_data = available_slice.ljust(16)  # Pad with spaces if needed

        value, lli, ssi = self.parse_observation_slice(slice_data)

        observation = Observation(
            obs_type=band.split("|")[1][0],
            value=value,
            lli=lli,
            ssi=ssi,
        )
        satellite.add_observation(observation)

    return satellite

iter_epochs()

Yield epochs one by one instead of materializing the whole list.

Returns

Generator Generator yielding Rnxv3ObsEpochRecord objects

Yields

Rnxv3ObsEpochRecord Each epoch with timestamp and satellite observations

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
def iter_epochs(self) -> Iterator[Rnxv3ObsEpochRecord]:
    """Yield epochs one by one instead of materializing the whole list.

    Returns
    -------
    Generator
        Generator yielding Rnxv3ObsEpochRecord objects

    Yields
    ------
    Rnxv3ObsEpochRecord
        Each epoch with timestamp and satellite observations

    """
    for start, end in self.get_epoch_record_batches():
        try:
            info = Rnxv3ObsEpochRecordLineModel.model_validate(
                {"epoch": self._lines[start]}
            )

            # Skip event epochs (flag 2-6: special records, not observations)
            if info.epoch_flag > 1:
                continue

            # Filter out blank/whitespace-only lines from data slice
            data = [line for line in self._lines[start + 1 : end] if line.strip()]
            epoch = Rnxv3ObsEpochRecord(
                info=info,
                data=[self.process_satellite_data(line) for line in data],
            )
            yield epoch
        except InvalidEpochError, IncompleteEpochError, ValueError:
            # Skip epochs with validation errors (invalid SV, malformed data,
            # pydantic ValidationError inherits from ValueError)
            pass

iter_epochs_in_range(start, end)

Yield epochs lazily that fall into the given datetime range.

Parameters

start : datetime Start of time range (inclusive) end : datetime End of time range (inclusive)

Returns

Generator Generator yielding epochs in the specified range

Yields

Rnxv3ObsEpochRecord Epochs within the time range

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
def iter_epochs_in_range(
    self,
    start: datetime,
    end: datetime,
) -> Iterable[Rnxv3ObsEpochRecord]:
    """Yield epochs lazily that fall into the given datetime range.

    Parameters
    ----------
    start : datetime
        Start of time range (inclusive)
    end : datetime
        End of time range (inclusive)

    Returns
    -------
    Generator
        Generator yielding epochs in the specified range

    Yields
    ------
    Rnxv3ObsEpochRecord
        Epochs within the time range

    """
    for epoch in self.iter_epochs():
        dt = self.get_datetime_from_epoch_record_info(epoch.info)
        if start <= dt <= end:
            yield epoch

get_datetime_from_epoch_record_info(epoch_record_info)

Convert epoch record info to datetime object.

Parameters

epoch_record_info : Rnxv3ObsEpochRecordLineModel Parsed epoch record line

Returns

datetime Timestamp from epoch record

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
def get_datetime_from_epoch_record_info(
    self,
    epoch_record_info: Rnxv3ObsEpochRecordLineModel,
) -> datetime:
    """Convert epoch record info to datetime object.

    Parameters
    ----------
    epoch_record_info : Rnxv3ObsEpochRecordLineModel
        Parsed epoch record line

    Returns
    -------
    datetime
        Timestamp from epoch record

    """
    return datetime(
        year=int(epoch_record_info.year),
        month=int(epoch_record_info.month),
        day=int(epoch_record_info.day),
        hour=int(epoch_record_info.hour),
        minute=int(epoch_record_info.minute),
        second=int(epoch_record_info.seconds),
        tzinfo=UTC,
    )

epochrecordinfo_dt_to_numpy_dt(epch) staticmethod

Convert Python datetime to numpy datetime64[ns].

Parameters

epch : Rnxv3ObsEpochRecord Epoch record containing timestamp info

Returns

np.datetime64 Numpy datetime64 with nanosecond precision

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
@staticmethod
def epochrecordinfo_dt_to_numpy_dt(
    epch: Rnxv3ObsEpochRecord,
) -> np.datetime64:
    """Convert Python datetime to numpy datetime64[ns].

    Parameters
    ----------
    epch : Rnxv3ObsEpochRecord
        Epoch record containing timestamp info

    Returns
    -------
    np.datetime64
        Numpy datetime64 with nanosecond precision

    """
    dt = datetime(
        year=int(epch.info.year),
        month=int(epch.info.month),
        day=int(epch.info.day),
        hour=int(epch.info.hour),
        minute=int(epch.info.minute),
        second=int(epch.info.seconds),
        tzinfo=UTC,
    )
    # np.datetime64 doesn't support timezone info, but datetime is already UTC
    # Convert to naive datetime (UTC) to avoid warning
    return np.datetime64(dt.replace(tzinfo=None), "ns")

infer_sampling_interval()

Infer sampling interval from consecutive epoch deltas.

Returns

pint.Quantity or None Sampling interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
def infer_sampling_interval(self) -> pint.Quantity | None:
    """Infer sampling interval from consecutive epoch deltas.

    Returns
    -------
    pint.Quantity or None
        Sampling interval in seconds, or None if cannot be inferred

    """
    dts = self._epoch_datetimes()
    if len(dts) < MIN_EPOCHS_FOR_INTERVAL:
        return None
    # Compute deltas
    deltas: list[timedelta] = [b - a for a, b in pairwise(dts) if b >= a]
    if not deltas:
        return None
    # Pick the most common delta (robust to an occasional missing epoch)
    seconds = Counter(
        int(dt.total_seconds()) for dt in deltas if dt.total_seconds() > 0
    )
    if not seconds:
        return None
    mode_seconds, _ = seconds.most_common(1)[0]
    return (mode_seconds * UREG.second).to(UREG.seconds)

infer_dump_interval(sampling_interval=None)

Infer the intended dump interval for the RINEX file.

Parameters

sampling_interval : pint.Quantity, optional Known sampling interval. If provided, returns (#epochs * sampling_interval)

Returns

pint.Quantity or None Dump interval in seconds, or None if cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
def infer_dump_interval(
    self, sampling_interval: pint.Quantity | None = None
) -> pint.Quantity | None:
    """Infer the intended dump interval for the RINEX file.

    Parameters
    ----------
    sampling_interval : pint.Quantity, optional
        Known sampling interval. If provided, returns (#epochs * sampling_interval)

    Returns
    -------
    pint.Quantity or None
        Dump interval in seconds, or None if cannot be inferred

    """
    idx = self.get_epoch_record_batches()
    n_epochs = len(idx)
    if n_epochs == 0:
        return None

    if sampling_interval is not None:
        return (n_epochs * sampling_interval).to(UREG.seconds)

    # Fallback: time coverage inclusive (last - first) + typical step
    dts = self._epoch_datetimes()
    if len(dts) == 0:
        return None
    if len(dts) == 1:
        # single epoch: treat as 1 * unknown step (cannot infer)
        return None

    # Estimate step from data
    est_step = self.infer_sampling_interval()
    if est_step is None:
        return None

    # Inclusive coverage often equals (n_epochs - 1) * step; intended
    # dump interval is n_epochs * step.
    return (n_epochs * est_step.to(UREG.seconds)).to(UREG.seconds)

validate_epoch_completeness(dump_interval=None, sampling_interval=None)

Validate that the number of epochs matches the expected dump interval.

Parameters

dump_interval : str or pint.Quantity, optional Expected file dump interval. If None, inferred from epochs. sampling_interval : str or pint.Quantity, optional Expected sampling interval. If None, inferred from epochs.

Returns

None

Raises

MissingEpochError If total sampling time doesn't match dump interval ValueError If intervals cannot be inferred

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
def validate_epoch_completeness(
    self,
    dump_interval: str | pint.Quantity | None = None,
    sampling_interval: str | pint.Quantity | None = None,
) -> None:
    """Validate that the number of epochs matches the expected dump interval.

    Parameters
    ----------
    dump_interval : str or pint.Quantity, optional
        Expected file dump interval. If None, inferred from epochs.
    sampling_interval : str or pint.Quantity, optional
        Expected sampling interval. If None, inferred from epochs.

    Returns
    -------
    None

    Raises
    ------
    MissingEpochError
        If total sampling time doesn't match dump interval
    ValueError
        If intervals cannot be inferred

    """
    # Normalize/Infer sampling interval
    if sampling_interval is None:
        inferred = self.infer_sampling_interval()
        if inferred is None:
            msg = "Could not infer sampling interval from epochs"
            raise ValueError(msg)
        sampling_interval = inferred
    # normalize to pint
    elif not isinstance(sampling_interval, pint.Quantity):
        sampling_interval = UREG.Quantity(sampling_interval).to(UREG.seconds)

    # Normalize/Infer dump interval
    if dump_interval is None:
        inferred_dump = self.infer_dump_interval(
            sampling_interval=sampling_interval
        )
        if inferred_dump is None:
            msg = "Could not infer dump interval from file"
            raise ValueError(msg)
        dump_interval = inferred_dump
    elif not isinstance(dump_interval, pint.Quantity):
        # Accept '15 min', '1h', etc.
        dump_interval = UREG.Quantity(dump_interval).to(UREG.seconds)

    # Build inputs for the validator model
    epoch_indices = self.get_epoch_record_batches()

    # This throws MissingEpochError automatically if inconsistent
    cast(Any, Rnxv3ObsEpochRecordCompletenessModel)(
        epoch_records_indeces=epoch_indices,
        rnx_file_dump_interval=dump_interval,
        sampling_interval=sampling_interval,
    )

filter_by_overlapping_groups(ds, group_preference=None)

Filter overlapping bands using per-group preferences.

Parameters

ds : xr.Dataset Dataset with sid dimension and signal properties. group_preference : dict[str, str], optional Mapping of overlap group to preferred band.

Returns

xr.Dataset Dataset filtered to preferred overlapping bands.

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
def filter_by_overlapping_groups(
    self,
    ds: xr.Dataset,
    group_preference: dict[str, str] | None = None,
) -> xr.Dataset:
    """Filter overlapping bands using per-group preferences.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset with `sid` dimension and signal properties.
    group_preference : dict[str, str], optional
        Mapping of overlap group to preferred band.

    Returns
    -------
    xr.Dataset
        Dataset filtered to preferred overlapping bands.

    """
    if group_preference is None:
        group_preference = {
            "L1_E1_B1I": "L1",
            "L5_E5a": "L5",
            "L2_E5b_B2b": "L2",
        }

    keep = []
    for sid in ds.sid.values:
        parts = str(sid).split("|")
        band = parts[1] if len(parts) >= 2 else ""
        group = self._signal_mapper.get_overlapping_group(band)
        if group and group in group_preference:
            if band == group_preference[group]:
                keep.append(sid)
        else:
            keep.append(sid)
    return ds.sel(sid=keep)

create_rinex_netcdf_with_signal_id(start=None, end=None)

Create a NetCDF dataset with signal IDs.

Always uses the fast single-pass path. Optionally restricts to epochs within a datetime range via post-filtering.

Parameters

start : datetime, optional Start of time range (inclusive). end : datetime, optional End of time range (inclusive).

Returns

xr.Dataset Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
def create_rinex_netcdf_with_signal_id(
    self,
    start: datetime | None = None,
    end: datetime | None = None,
) -> xr.Dataset:
    """Create a NetCDF dataset with signal IDs.

    Always uses the fast single-pass path.  Optionally restricts to
    epochs within a datetime range via post-filtering.

    Parameters
    ----------
    start : datetime, optional
        Start of time range (inclusive).
    end : datetime, optional
        End of time range (inclusive).

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid).

    """
    ds = self._create_dataset_single_pass()

    if start or end:
        ds = ds.sel(epoch=slice(start, end))

    return ds

to_ds(keep_data_vars=None, **kwargs)

Convert RINEX observations to xarray.Dataset with signal ID structure.

Parameters

outname : Path or str, optional If provided, saves dataset to this file path keep_data_vars : list of str or None, optional Data variables to include in dataset. Defaults to config value. write_global_attrs : bool, default False If True, adds comprehensive global attributes pad_global_sid : bool, default True If True, pads to global signal ID space strip_fillval : bool, default True If True, removes fill values add_future_datavars : bool, default True If True, adds placeholder variables for future data keep_sids : list of str or None, default None If provided, filters/pads dataset to these specific SIDs. If None and pad_global_sid=True, pads to all possible SIDs.

Returns

xr.Dataset Dataset with dimensions (epoch, sid) and requested data variables

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert RINEX observations to xarray.Dataset with signal ID structure.

    Parameters
    ----------
    outname : Path or str, optional
        If provided, saves dataset to this file path
    keep_data_vars : list of str or None, optional
        Data variables to include in dataset. Defaults to config value.
    write_global_attrs : bool, default False
        If True, adds comprehensive global attributes
    pad_global_sid : bool, default True
        If True, pads to global signal ID space
    strip_fillval : bool, default True
        If True, removes fill values
    add_future_datavars : bool, default True
        If True, adds placeholder variables for future data
    keep_sids : list of str or None, default None
        If provided, filters/pads dataset to these specific SIDs.
        If None and pad_global_sid=True, pads to all possible SIDs.

    Returns
    -------
    xr.Dataset
        Dataset with dimensions (epoch, sid) and requested data variables

    """
    outname = cast(Path | str | None, kwargs.pop("outname", None))
    write_global_attrs = bool(kwargs.pop("write_global_attrs", False))
    pad_global_sid = bool(kwargs.pop("pad_global_sid", True))
    strip_fillval = bool(kwargs.pop("strip_fillval", True))
    add_future_datavars = bool(kwargs.pop("add_future_datavars", True))
    keep_sids = cast(list[str] | None, kwargs.pop("keep_sids", None))

    if keep_data_vars is None:
        from canvod.utils.config import load_config

        keep_data_vars = load_config().processing.processing.keep_rnx_vars

    ds = self.create_rinex_netcdf_with_signal_id()

    # drop unwanted vars
    for var in list(ds.data_vars):
        if var not in keep_data_vars:
            ds = ds.drop_vars([var])

    if pad_global_sid:
        from canvod.auxiliary.preprocessing import pad_to_global_sid

        # Pad/filter to specified sids or all possible sids
        ds = pad_to_global_sid(ds, keep_sids=keep_sids)

    if strip_fillval:
        from canvod.auxiliary.preprocessing import strip_fillvalue

        ds = strip_fillvalue(ds)

    if add_future_datavars:
        pass

    if write_global_attrs:
        ds.attrs.update(self._create_comprehensive_attrs())

    ds.attrs.update(self._build_attrs())

    if outname:
        from canvod.utils.config import load_config as _load_config

        comp = _load_config().processing.compression
        encoding = {
            var: {"zlib": comp.zlib, "complevel": comp.complevel}
            for var in ds.data_vars
        }
        ds.to_netcdf(str(outname), encoding=encoding)

    # Validate output structure for pipeline compatibility
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

validate_rinex_304_compliance(ds=None, strict=False, print_report=True)

Run enhanced RINEX 3.04 specification validation.

Validates: 1. System-specific observation codes 2. GLONASS mandatory fields (slot/frequency, biases) 3. Phase shift records (RINEX 3.01+) 4. Observation value ranges

Parameters

ds : xr.Dataset, optional Dataset to validate. If None, creates one from current file. strict : bool If True, raise ValueError on validation failures print_report : bool If True, print validation report to console

Returns

dict[str, list[str]] Validation results by category

Examples

reader = Rnxv3Obs(fpath="station.24o") results = reader.validate_rinex_304_compliance()

Or validate a specific dataset

ds = reader.to_ds() results = reader.validate_rinex_304_compliance(ds=ds)

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
def validate_rinex_304_compliance(
    self,
    ds: xr.Dataset | None = None,
    strict: bool = False,
    print_report: bool = True,
) -> dict[str, list[str]]:
    """Run enhanced RINEX 3.04 specification validation.

    Validates:
    1. System-specific observation codes
    2. GLONASS mandatory fields (slot/frequency, biases)
    3. Phase shift records (RINEX 3.01+)
    4. Observation value ranges

    Parameters
    ----------
    ds : xr.Dataset, optional
        Dataset to validate. If None, creates one from current file.
    strict : bool
        If True, raise ValueError on validation failures
    print_report : bool
        If True, print validation report to console

    Returns
    -------
    dict[str, list[str]]
        Validation results by category

    Examples
    --------
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> results = reader.validate_rinex_304_compliance()
    >>> # Or validate a specific dataset
    >>> ds = reader.to_ds()
    >>> results = reader.validate_rinex_304_compliance(ds=ds)

    """
    if ds is None:
        ds = self.to_ds(write_global_attrs=False)

    # Prepare header dict for validators
    header_dict: dict[str, Any] = {
        "obs_codes_per_system": self.header.obs_codes_per_system,
    }

    # Add GLONASS-specific headers if available
    if hasattr(self.header, "glonass_slot_frq"):
        header_dict["GLONASS SLOT / FRQ #"] = self.header.glonass_slot_frq

    if hasattr(self.header, "glonass_cod_phs_bis"):
        header_dict["GLONASS COD/PHS/BIS"] = self.header.glonass_cod_phs_bis

    if hasattr(self.header, "phase_shift"):
        header_dict["SYS / PHASE SHIFT"] = self.header.phase_shift

    # Run validation
    results = RINEX304ComplianceValidator.validate_all(
        ds=ds, header_dict=header_dict, strict=strict
    )

    if print_report:
        RINEX304ComplianceValidator.print_validation_report(results)

    return results

adapt_existing_rnxv3obs_class(original_class_path=None)

Provide guidance to integrate the enhanced sid functionality.

This function provides guidance on how to modify the existing class to support the new sid structure alongside the current OFT structure.

Returns

str Integration instructions

Source code in packages/canvod-readers/src/canvod/readers/rinex/v3_04.py
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
def adapt_existing_rnxv3obs_class(original_class_path: str | None = None) -> str:
    """Provide guidance to integrate the enhanced sid functionality.

    This function provides guidance on how to modify the existing class
    to support the new sid structure alongside the current OFT structure.

    Returns
    -------
    str
        Integration instructions

    """
    _ = original_class_path
    return """
    INTEGRATION GUIDE: Adapting Rnxv3Obs for sid Structure
    ============================================================

    To integrate the new sid functionality into your existing Rnxv3Obs class:

    1. ADD THE SIGNAL_ID_MAPPER CLASS:
       - Copy the SignalIDMapper class to your rinex_reader.py file
       - This handles the mapping logic and band properties

    2. ADD NEW METHODS TO Rnxv3Obs CLASS:

       Method: create_rinex_netcdf_with_signal_id()
       - Copy from EnhancedRnxv3Obs.create_rinex_netcdf_with_signal_id()
       - This creates the new sid-based structure

       Method: filter_by_overlapping_groups()
       - Copy from EnhancedRnxv3Obs.filter_by_overlapping_groups()
       - Handles overlapping signal filtering (Problem A solution)

       Method: to_ds()
       - Copy from EnhancedRnxv3Obs.to_ds()
       - Main interface for creating sid datasets

       Method: create_legacy_compatible_dataset()
       - Copy from EnhancedRnxv3Obs.create_legacy_compatible_dataset()
       - Provides backward compatibility

    3. UPDATE THE __init__ METHOD:
       Add: self.signal_mapper = SignalIDMapper()

    4. MODIFY EXISTING METHODS:
       - Keep existing create_rinex_netcdf_with_oft() for OFT compatibility
       - Add sid option to your main interface methods
       - Update data handlers to support sid dimension

    5. UPDATE DATA_HANDLER/RNX_PARSER.PY:
       - Modify concatenate_datasets() to handle sid dimension
       - Add sid detection alongside OFT detection
       - Update encoding to handle sid string coordinates

    6. UPDATE PROCESSOR/PROCESSOR.PY:
       - Add sid support to create_common_space_datatree()
       - Handle both OFT and sid structures in alignment logic

    BENEFITS OF THIS STRUCTURE:
    ===========================

    ✓ Solves Problem A: Bandwidth overlap handling
      - Overlapping signals kept separate with metadata for filtering
      - band properties include bandwidth information

    ✓ Solves Problem B: code-specific performance differences
      - Each sv|band|code combination gets unique sid
      - No more priority-based LUT - all combinations preserved

    ✓ Maintains compatibility:
      - Legacy conversion available
      - OFT structure still supported
      - Existing code continues to work

    ✓ Enhanced filtering capabilities:
      - Filter by system, band, code independently
      - Complex filtering with multiple criteria
      - Overlap group filtering for analysis

    MIGRATION PATH:
    ===============

    Phase 1: Add sid methods alongside existing OFT methods
    Phase 2: Update data handlers to support both structures
    Phase 3: Gradually migrate analysis code to use sid
    Phase 4: Deprecate old frequency-mapping approach (optional)

    EXAMPLE USAGE AFTER INTEGRATION:
    =================================

    # Create datasets with different structures
    ds_oft = rnx.create_rinex_netcdf_with_oft()           # Current OFT structure
    ds_signal = rnx.create_rinex_netcdf_with_signal_id()  # New sid structure
    ds_legacy = rnx.create_rinex_netcdf(mapped_epochs)    # Legacy structure

    # Advanced sid usage
    ds_enhanced = rnx.to_ds(
        keep_data_vars=["SNR", "Phase"],
        apply_overlap_filter=True,
        overlap_preferences={'L1_E1_B1I': 'L1'}  # Prefer GPS L1 over Galileo E1
    )
    """

Base Reader

Abstract base class for GNSS data readers.

Defines interface that all readers (RINEX v3, RINEX v2, SBF, future formats) must implement to ensure compatibility with downstream pipeline: - VOD calculation (canvod-vod) - Storage (canvod-store / MyIcechunkStore) - Grid operations (canvod-grids)

Contract constants (REQUIRED_DIMS, REQUIRED_COORDS, etc.) are the single source of truth for the output Dataset structure. Use :func:validate_dataset to check any Dataset against them.

DatasetStructureValidator

Bases: BaseModel

Validates that an xarray.Dataset meets the GNSSDataReader contract.

Wraps a Dataset and checks it against the contract constants above. Use this in tests and reader implementations to catch structural errors early with clear messages.

Examples

validator = DatasetStructureValidator(dataset=ds) validator.validate_all() # raises ValueError on any violation validator.validate_dimensions() # check just one aspect

Source code in packages/canvod-readers/src/canvod/readers/base.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
class DatasetStructureValidator(BaseModel):
    """Validates that an xarray.Dataset meets the GNSSDataReader contract.

    Wraps a Dataset and checks it against the contract constants above.
    Use this in tests and reader implementations to catch structural errors
    early with clear messages.

    Examples
    --------
    >>> validator = DatasetStructureValidator(dataset=ds)
    >>> validator.validate_all()          # raises ValueError on any violation
    >>> validator.validate_dimensions()   # check just one aspect
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    dataset: xr.Dataset

    def validate_all(self, required_vars: list[str] | None = None) -> None:
        """Run all validations, collecting **all** errors.

        Delegates to :func:`validate_dataset` so the logic lives in one place.
        """
        validate_dataset(self.dataset, required_vars=required_vars)

    def validate_dimensions(self) -> None:
        """Check that required dimensions (epoch, sid) exist."""
        missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
        if missing:
            raise ValueError(f"Missing required dimensions: {missing}")

    def validate_coordinates(self) -> None:
        """Check that required coordinates exist with correct dtypes."""
        for coord, expected_dtype in REQUIRED_COORDS.items():
            if coord not in self.dataset.coords:
                raise ValueError(f"Missing required coordinate: {coord}")
            actual = str(self.dataset[coord].dtype)
            if expected_dtype == "object":
                is_valid_string = actual == "object" or actual.startswith("StringDType")
                if not is_valid_string:
                    raise ValueError(
                        f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                    )
            elif expected_dtype not in actual:
                raise ValueError(
                    f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
                )

    def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
        """Check that required data variables exist with correct dims."""
        if required_vars is None:
            required_vars = list(DEFAULT_REQUIRED_VARS)
        missing = set(required_vars) - set(self.dataset.data_vars)
        if missing:
            raise ValueError(f"Missing required data variables: {missing}")
        for var in self.dataset.data_vars:
            if self.dataset[var].dims != REQUIRED_DIMS:
                raise ValueError(
                    f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                    f"got {self.dataset[var].dims}"
                )

    def validate_attributes(self) -> None:
        """Check that required global attributes are present."""
        missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
        if missing:
            raise ValueError(f"Missing required attributes: {missing}")

validate_all(required_vars=None)

Run all validations, collecting all errors.

Delegates to :func:validate_dataset so the logic lives in one place.

Source code in packages/canvod-readers/src/canvod/readers/base.py
154
155
156
157
158
159
def validate_all(self, required_vars: list[str] | None = None) -> None:
    """Run all validations, collecting **all** errors.

    Delegates to :func:`validate_dataset` so the logic lives in one place.
    """
    validate_dataset(self.dataset, required_vars=required_vars)

validate_dimensions()

Check that required dimensions (epoch, sid) exist.

Source code in packages/canvod-readers/src/canvod/readers/base.py
161
162
163
164
165
def validate_dimensions(self) -> None:
    """Check that required dimensions (epoch, sid) exist."""
    missing = set(REQUIRED_DIMS) - set(self.dataset.dims)
    if missing:
        raise ValueError(f"Missing required dimensions: {missing}")

validate_coordinates()

Check that required coordinates exist with correct dtypes.

Source code in packages/canvod-readers/src/canvod/readers/base.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def validate_coordinates(self) -> None:
    """Check that required coordinates exist with correct dtypes."""
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in self.dataset.coords:
            raise ValueError(f"Missing required coordinate: {coord}")
        actual = str(self.dataset[coord].dtype)
        if expected_dtype == "object":
            is_valid_string = actual == "object" or actual.startswith("StringDType")
            if not is_valid_string:
                raise ValueError(
                    f"Coordinate {coord}: expected string (object/StringDType), got {actual}"
                )
        elif expected_dtype not in actual:
            raise ValueError(
                f"Coordinate {coord}: expected {expected_dtype}, got {actual}"
            )

validate_data_variables(required_vars=None)

Check that required data variables exist with correct dims.

Source code in packages/canvod-readers/src/canvod/readers/base.py
184
185
186
187
188
189
190
191
192
193
194
195
196
def validate_data_variables(self, required_vars: list[str] | None = None) -> None:
    """Check that required data variables exist with correct dims."""
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)
    missing = set(required_vars) - set(self.dataset.data_vars)
    if missing:
        raise ValueError(f"Missing required data variables: {missing}")
    for var in self.dataset.data_vars:
        if self.dataset[var].dims != REQUIRED_DIMS:
            raise ValueError(
                f"Variable {var}: expected dims {REQUIRED_DIMS}, "
                f"got {self.dataset[var].dims}"
            )

validate_attributes()

Check that required global attributes are present.

Source code in packages/canvod-readers/src/canvod/readers/base.py
198
199
200
201
202
def validate_attributes(self) -> None:
    """Check that required global attributes are present."""
    missing = REQUIRED_ATTRS - set(self.dataset.attrs.keys())
    if missing:
        raise ValueError(f"Missing required attributes: {missing}")

SignalID

Bases: BaseModel

Validated signal identifier (SV + band + code).

sid = SignalID(sv="G01", band="L1", code="C") str(sid) 'G01|L1|C' sid.system 'G'

Source code in packages/canvod-readers/src/canvod/readers/base.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
class SignalID(BaseModel):
    """Validated signal identifier (SV + band + code).

    >>> sid = SignalID(sv="G01", band="L1", code="C")
    >>> str(sid)
    'G01|L1|C'
    >>> sid.system
    'G'
    """

    model_config = ConfigDict(frozen=True)

    sv: str
    band: str
    code: str

    @field_validator("sv")
    @classmethod
    def _validate_sv(cls, v: str) -> str:
        if not SV_PATTERN.match(v):
            raise ValueError(
                f"Invalid SV: {v!r} — expected system letter + 2-digit PRN "
                f"(e.g. 'G01'). Valid systems: G, R, E, C, J, S, I"
            )
        return v

    @property
    def system(self) -> str:
        """GNSS system letter (e.g. 'G' for GPS)."""
        return self.sv[0]

    @property
    def sid(self) -> str:
        """Full signal ID string ('SV|band|code')."""
        return f"{self.sv}|{self.band}|{self.code}"

    def __str__(self) -> str:
        return self.sid

    def __hash__(self) -> int:
        return hash(self.sid)

    def __eq__(self, other: object) -> bool:
        if isinstance(other, SignalID):
            return self.sid == other.sid
        return NotImplemented

    @classmethod
    def from_string(cls, sid_str: str) -> SignalID:
        """Parse a signal ID string ('SV|band|code') into a SignalID.

        Parameters
        ----------
        sid_str : str
            Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

        Returns
        -------
        SignalID
            Validated signal identifier.

        Raises
        ------
        ValueError
            If the string does not have exactly three pipe-separated parts.
        """
        parts = sid_str.split("|")
        if len(parts) != 3:
            raise ValueError(
                f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
            )
        return cls(sv=parts[0], band=parts[1], code=parts[2])

system property

GNSS system letter (e.g. 'G' for GPS).

sid property

Full signal ID string ('SV|band|code').

from_string(sid_str) classmethod

Parse a signal ID string ('SV|band|code') into a SignalID.

Parameters

sid_str : str Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

Returns

SignalID Validated signal identifier.

Raises

ValueError If the string does not have exactly three pipe-separated parts.

Source code in packages/canvod-readers/src/canvod/readers/base.py
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
@classmethod
def from_string(cls, sid_str: str) -> SignalID:
    """Parse a signal ID string ('SV|band|code') into a SignalID.

    Parameters
    ----------
    sid_str : str
        Signal ID in 'SV|band|code' format (e.g. 'G01|L1|C').

    Returns
    -------
    SignalID
        Validated signal identifier.

    Raises
    ------
    ValueError
        If the string does not have exactly three pipe-separated parts.
    """
    parts = sid_str.split("|")
    if len(parts) != 3:
        raise ValueError(
            f"Invalid SID format: {sid_str!r} — expected 'SV|band|code'"
        )
    return cls(sv=parts[0], band=parts[1], code=parts[2])

GNSSDataReader

Bases: BaseModel, ABC

Abstract base class for all GNSS data format readers.

All readers must: 1. Inherit from this class 2. Implement all abstract methods 3. Return xarray.Dataset that passes :func:validate_dataset 4. Provide file hash for deduplication

This ensures compatibility with: - canvod-vod: VOD calculation - canvod-store: MyIcechunkStore storage - canvod-grids: Grid projection operations

Subclasses may override model_config to set frozen, extra, etc. The base class provides arbitrary_types_allowed=True which is needed by readers that use pint.Quantity or similar third-party types.

Examples

class Rnxv3Obs(GNSSDataReader): ... def to_ds(self, **kwargs) -> xr.Dataset: ... # Implementation ... return dataset ... reader = Rnxv3Obs(fpath="station.24o") ds = reader.to_ds() validate_dataset(ds)

Source code in packages/canvod-readers/src/canvod/readers/base.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
class GNSSDataReader(BaseModel, ABC):
    """Abstract base class for all GNSS data format readers.

    All readers must:
    1. Inherit from this class
    2. Implement all abstract methods
    3. Return xarray.Dataset that passes :func:`validate_dataset`
    4. Provide file hash for deduplication

    This ensures compatibility with:
    - canvod-vod: VOD calculation
    - canvod-store: MyIcechunkStore storage
    - canvod-grids: Grid projection operations

    Subclasses may override ``model_config`` to set ``frozen``, ``extra``,
    etc.  The base class provides ``arbitrary_types_allowed=True`` which is
    needed by readers that use ``pint.Quantity`` or similar third-party types.

    Examples
    --------
    >>> class Rnxv3Obs(GNSSDataReader):
    ...     def to_ds(self, **kwargs) -> xr.Dataset:
    ...         # Implementation
    ...         return dataset
    ...
    >>> reader = Rnxv3Obs(fpath="station.24o")
    >>> ds = reader.to_ds()
    >>> validate_dataset(ds)
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    fpath: Path

    @field_validator("fpath")
    @classmethod
    def _validate_fpath(cls, v: Path) -> Path:
        """Validate that the file path points to an existing file."""
        v = Path(v)
        if not v.is_file():
            raise FileNotFoundError(f"File not found: {v}")
        return v

    @property
    def source_format(self) -> str:
        """Return the format identifier for this reader (e.g. ``"rinex3"``, ``"sbf"``)."""
        return "rinex3"

    @property
    @abstractmethod
    def file_hash(self) -> str:
        """Return SHA256 hash of file for deduplication.

        Used by MyIcechunkStore to avoid duplicate ingestion.
        Must be deterministic and reproducible.

        Returns
        -------
        str
            Short hash (16 chars) or full hash of file content
        """

    @abstractmethod
    def to_ds(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> xr.Dataset:
        """Convert data to xarray.Dataset.

        Must return Dataset with structure:
        - Dims: (epoch, sid)
        - Coords: epoch, sid, sv, system, band, code, freq_*
        - Data vars: At minimum SNR
        - Attrs: Must include "File Hash"

        Parameters
        ----------
        keep_data_vars : list of str, optional
            Data variables to include. If None, includes all available.
        **kwargs
            Implementation-specific parameters

        Returns
        -------
        xr.Dataset
            Dataset that passes :func:`validate_dataset`.
        """

    @abstractmethod
    def iter_epochs(self) -> Iterator[object]:
        """Iterate over epochs in the file.

        Yields
        ------
        Epoch
            Parsed epoch with satellites and observations.
        """

    def to_ds_and_auxiliary(
        self,
        keep_data_vars: list[str] | None = None,
        **kwargs: object,
    ) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
        """Produce the obs dataset and any auxiliary datasets in a single call.

        Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
        Readers that produce metadata (e.g. SBF) override this to collect both
        in a single file scan.

        Returns
        -------
        tuple[xr.Dataset, dict[str, xr.Dataset]]
            ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
            readers with no extra data (RINEX v2/v3).
        """
        return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

    def _build_attrs(self) -> dict[str, str]:
        """Build standard global attributes for the output Dataset.

        Reads institution/author from config, adds timestamp, version,
        and the file hash.

        Returns
        -------
        dict[str, str]
            Ready-to-use attrs dict.
        """
        from canvod.readers.gnss_specs.metadata import get_global_attrs
        from canvod.readers.gnss_specs.utils import get_version_from_pyproject

        attrs = get_global_attrs()
        attrs["Created"] = datetime.now(UTC).isoformat()
        attrs["Software"] = (
            f"{attrs['Software']}, Version: {get_version_from_pyproject()}"
        )
        attrs["File Hash"] = self.file_hash
        return attrs

    @property
    @abstractmethod
    def start_time(self) -> datetime:
        """Return start time of observations.

        Returns
        -------
        datetime
            First observation timestamp in the file.
        """

    @property
    @abstractmethod
    def end_time(self) -> datetime:
        """Return end time of observations.

        Returns
        -------
        datetime
            Last observation timestamp in the file.
        """

    @property
    @abstractmethod
    def systems(self) -> list[str]:
        """Return list of GNSS systems in file.

        Returns
        -------
        list of str
            System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'
        """

    @property
    def num_epochs(self) -> int:
        """Return number of epochs in file.

        Default implementation iterates epochs.  Subclasses may override
        with a faster approach.

        Returns
        -------
        int
            Total number of observation epochs.
        """
        return sum(1 for _ in self.iter_epochs())

    @property
    @abstractmethod
    def num_satellites(self) -> int:
        """Return total number of unique satellites observed.

        Returns
        -------
        int
            Count of unique satellite vehicles across all systems.
        """

    def __repr__(self) -> str:
        """Return the string representation."""
        return f"{self.__class__.__name__}(file='{self.fpath.name}')"

source_format property

Return the format identifier for this reader (e.g. "rinex3", "sbf").

file_hash abstractmethod property

Return SHA256 hash of file for deduplication.

Used by MyIcechunkStore to avoid duplicate ingestion. Must be deterministic and reproducible.

Returns

str Short hash (16 chars) or full hash of file content

start_time abstractmethod property

Return start time of observations.

Returns

datetime First observation timestamp in the file.

end_time abstractmethod property

Return end time of observations.

Returns

datetime Last observation timestamp in the file.

systems abstractmethod property

Return list of GNSS systems in file.

Returns

list of str System identifiers: 'G', 'R', 'E', 'C', 'J', 'S', 'I'

num_epochs property

Return number of epochs in file.

Default implementation iterates epochs. Subclasses may override with a faster approach.

Returns

int Total number of observation epochs.

num_satellites abstractmethod property

Return total number of unique satellites observed.

Returns

int Count of unique satellite vehicles across all systems.

to_ds(keep_data_vars=None, **kwargs) abstractmethod

Convert data to xarray.Dataset.

Must return Dataset with structure: - Dims: (epoch, sid) - Coords: epoch, sid, sv, system, band, code, freq_* - Data vars: At minimum SNR - Attrs: Must include "File Hash"

Parameters

keep_data_vars : list of str, optional Data variables to include. If None, includes all available. **kwargs Implementation-specific parameters

Returns

xr.Dataset Dataset that passes :func:validate_dataset.

Source code in packages/canvod-readers/src/canvod/readers/base.py
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
@abstractmethod
def to_ds(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> xr.Dataset:
    """Convert data to xarray.Dataset.

    Must return Dataset with structure:
    - Dims: (epoch, sid)
    - Coords: epoch, sid, sv, system, band, code, freq_*
    - Data vars: At minimum SNR
    - Attrs: Must include "File Hash"

    Parameters
    ----------
    keep_data_vars : list of str, optional
        Data variables to include. If None, includes all available.
    **kwargs
        Implementation-specific parameters

    Returns
    -------
    xr.Dataset
        Dataset that passes :func:`validate_dataset`.
    """

iter_epochs() abstractmethod

Iterate over epochs in the file.

Yields

Epoch Parsed epoch with satellites and observations.

Source code in packages/canvod-readers/src/canvod/readers/base.py
378
379
380
381
382
383
384
385
386
@abstractmethod
def iter_epochs(self) -> Iterator[object]:
    """Iterate over epochs in the file.

    Yields
    ------
    Epoch
        Parsed epoch with satellites and observations.
    """

to_ds_and_auxiliary(keep_data_vars=None, **kwargs)

Produce the obs dataset and any auxiliary datasets in a single call.

Default: calls to_ds(**kwargs) and returns an empty auxiliary dict. Readers that produce metadata (e.g. SBF) override this to collect both in a single file scan.

Returns

tuple[xr.Dataset, dict[str, xr.Dataset]] (obs_ds, {"name": aux_ds, ...}). Auxiliary dict is empty for readers with no extra data (RINEX v2/v3).

Source code in packages/canvod-readers/src/canvod/readers/base.py
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
def to_ds_and_auxiliary(
    self,
    keep_data_vars: list[str] | None = None,
    **kwargs: object,
) -> tuple[xr.Dataset, dict[str, xr.Dataset]]:
    """Produce the obs dataset and any auxiliary datasets in a single call.

    Default: calls ``to_ds(**kwargs)`` and returns an empty auxiliary dict.
    Readers that produce metadata (e.g. SBF) override this to collect both
    in a single file scan.

    Returns
    -------
    tuple[xr.Dataset, dict[str, xr.Dataset]]
        ``(obs_ds, {"name": aux_ds, ...})``.  Auxiliary dict is empty for
        readers with no extra data (RINEX v2/v3).
    """
    return self.to_ds(keep_data_vars=keep_data_vars, **kwargs), {}

__repr__()

Return the string representation.

Source code in packages/canvod-readers/src/canvod/readers/base.py
487
488
489
def __repr__(self) -> str:
    """Return the string representation."""
    return f"{self.__class__.__name__}(file='{self.fpath.name}')"

validate_dataset(ds, required_vars=None)

Validate ds meets the GNSSDataReader output contract.

Collects all violations and raises a single ValueError listing every problem, rather than stopping at the first failure.

Parameters

ds : xr.Dataset Dataset to validate. required_vars : list of str, optional Data variables that must be present. Defaults to :data:DEFAULT_REQUIRED_VARS (["SNR"]).

Raises

ValueError If any contract violation is found.

Source code in packages/canvod-readers/src/canvod/readers/base.py
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def validate_dataset(ds: xr.Dataset, required_vars: list[str] | None = None) -> None:
    """Validate *ds* meets the GNSSDataReader output contract.

    Collects **all** violations and raises a single ``ValueError`` listing
    every problem, rather than stopping at the first failure.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset to validate.
    required_vars : list of str, optional
        Data variables that must be present.  Defaults to
        :data:`DEFAULT_REQUIRED_VARS` (``["SNR"]``).

    Raises
    ------
    ValueError
        If any contract violation is found.
    """
    if required_vars is None:
        required_vars = list(DEFAULT_REQUIRED_VARS)

    errors: list[str] = []

    # -- dimensions --
    missing_dims = set(REQUIRED_DIMS) - set(ds.dims)
    if missing_dims:
        errors.append(f"Missing required dimensions: {missing_dims}")

    # -- coordinates --
    for coord, expected_dtype in REQUIRED_COORDS.items():
        if coord not in ds.coords:
            errors.append(f"Missing required coordinate: {coord}")
            continue

        actual_dtype = str(ds[coord].dtype)
        if expected_dtype == "object":
            # Accept object (VariableLengthUTF8, stable Zarr V3) and numpy 2.0
            # StringDType (same stable type, different numpy representation).
            # Reject <U* (FixedLengthUTF32) — no stable Zarr V3 spec.
            is_valid_string = actual_dtype == "object" or actual_dtype.startswith(
                "StringDType"
            )
            if not is_valid_string:
                errors.append(
                    f"Coordinate {coord} has wrong dtype: "
                    f"expected string (object/StringDType), got {actual_dtype}"
                )
        elif expected_dtype not in actual_dtype:
            errors.append(
                f"Coordinate {coord} has wrong dtype: "
                f"expected {expected_dtype}, got {actual_dtype}"
            )

    # -- data variables --
    missing_vars = set(required_vars) - set(ds.data_vars)
    if missing_vars:
        errors.append(f"Missing required data variables: {missing_vars}")

    expected_var_dims = ("epoch", "sid")
    for var in ds.data_vars:
        if ds[var].dims != expected_var_dims:
            errors.append(
                f"Data variable {var} has wrong dimensions: "
                f"expected {expected_var_dims}, got {ds[var].dims}"
            )

    # -- attributes --
    missing_attrs = REQUIRED_ATTRS - set(ds.attrs.keys())
    if missing_attrs:
        errors.append(f"Missing required attributes: {missing_attrs}")

    if errors:
        raise ValueError(
            "Dataset validation failed:\n" + "\n".join(f"  - {e}" for e in errors)
        )

Dataset Builder

Guided builder for constructing valid GNSSDataReader output Datasets.

Handles coordinate arrays, dtype enforcement, frequency resolution, and contract validation automatically.

Examples

builder = DatasetBuilder(reader) for epoch in reader.iter_epochs(): ... ei = builder.add_epoch(epoch.timestamp) ... for obs in epoch.observations: ... sig = builder.add_signal(sv="G01", band="L1", code="C") ... builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset

DatasetBuilder

Guided builder for constructing valid GNSSDataReader output Datasets.

Handles coordinate arrays, dtype enforcement, frequency resolution, and contract validation automatically.

Parameters

reader : GNSSDataReader The reader instance (used for _build_attrs() and file hash). aggregate_glonass_fdma : bool, optional Whether to aggregate GLONASS FDMA channels (default True).

Examples

builder = DatasetBuilder(reader) for epoch in reader.iter_epochs(): ... ei = builder.add_epoch(epoch.timestamp) ... for obs in epoch.observations: ... sig = builder.add_signal(sv="G01", band="L1", code="C") ... builder.set_value(ei, sig, "SNR", 42.0) ds = builder.build() # validated Dataset

Source code in packages/canvod-readers/src/canvod/readers/builder.py
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
class DatasetBuilder:
    """Guided builder for constructing valid GNSSDataReader output Datasets.

    Handles coordinate arrays, dtype enforcement, frequency resolution,
    and contract validation automatically.

    Parameters
    ----------
    reader : GNSSDataReader
        The reader instance (used for ``_build_attrs()`` and file hash).
    aggregate_glonass_fdma : bool, optional
        Whether to aggregate GLONASS FDMA channels (default True).

    Examples
    --------
    >>> builder = DatasetBuilder(reader)
    >>> for epoch in reader.iter_epochs():
    ...     ei = builder.add_epoch(epoch.timestamp)
    ...     for obs in epoch.observations:
    ...         sig = builder.add_signal(sv="G01", band="L1", code="C")
    ...         builder.set_value(ei, sig, "SNR", 42.0)
    >>> ds = builder.build()   # validated Dataset
    """

    def __init__(
        self,
        reader: GNSSDataReader,
        *,
        aggregate_glonass_fdma: bool = True,
    ) -> None:
        self._reader = reader
        self._mapper = SignalIDMapper(aggregate_glonass_fdma=aggregate_glonass_fdma)
        self._signals: dict[str, SignalID] = {}
        self._epochs: list[datetime] = []
        self._values: dict[str, dict[tuple[int, str], float]] = {}

    def add_epoch(self, timestamp: datetime) -> int:
        """Register an epoch timestamp. Returns epoch index."""
        self._epochs.append(timestamp)
        return len(self._epochs) - 1

    def add_signal(self, sv: str, band: str, code: str) -> SignalID:
        """Register a signal (idempotent). Returns validated SignalID."""
        sig = SignalID(sv=sv, band=band, code=code)
        self._signals[sig.sid] = sig
        return sig

    def set_value(
        self,
        epoch_idx: int,
        signal: SignalID | str,
        var: str,
        value: float,
    ) -> None:
        """Set a data value for a given epoch, signal, and variable.

        Parameters
        ----------
        epoch_idx : int
            Index returned by :meth:`add_epoch`.
        signal : SignalID or str
            Signal identifier (SignalID or 'SV|band|code' string).
        var : str
            Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
        value : float
            The observation value.
        """
        sid = str(signal)
        if var not in self._values:
            self._values[var] = {}
        self._values[var][(epoch_idx, sid)] = value

    def build(
        self,
        keep_data_vars: list[str] | None = None,
        extra_attrs: dict[str, str] | None = None,
    ) -> xr.Dataset:
        """Build, validate, and return the Dataset.

        1. Sorts signals alphabetically
        2. Resolves frequencies from band names via SignalIDMapper
        3. Constructs coordinate arrays with correct dtypes (float32 for freq)
        4. Attaches CF-compliant metadata from COORDS_METADATA
        5. Calls validate_dataset() before returning

        Parameters
        ----------
        keep_data_vars : list of str, optional
            If provided, only include these data variables.  If ``None``,
            includes all variables that had values set.
        extra_attrs : dict, optional
            Additional global attributes to merge into the Dataset.

        Returns
        -------
        xr.Dataset
            Validated Dataset with dimensions ``(epoch, sid)``.
        """
        sorted_sids = sorted(self._signals)
        sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
        n_epochs = len(self._epochs)
        n_sids = len(sorted_sids)

        # --- Coordinate arrays ---
        epoch_arr = [
            np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
            for ts in self._epochs
        ]
        sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
        system_arr = np.array(
            [self._signals[s].system for s in sorted_sids], dtype=object
        )
        band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
        code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

        # Frequency resolution via SignalIDMapper
        freq_center = np.array(
            [
                self._mapper.get_band_frequency(self._signals[s].band) or np.nan
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        bandwidths = np.array(
            [
                self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
                for s in sorted_sids
            ],
            dtype=np.float32,
        )
        freq_min = (freq_center - bandwidths / 2).astype(np.float32)
        freq_max = (freq_center + bandwidths / 2).astype(np.float32)

        # --- Determine which variables to include ---
        all_vars = set(self._values.keys())
        if keep_data_vars is not None:
            vars_to_build = [v for v in keep_data_vars if v in all_vars]
        else:
            vars_to_build = sorted(all_vars)

        # --- Data variable arrays ---
        data_vars: dict[str, tuple] = {}
        for var in vars_to_build:
            dtype = DTYPES.get(var, np.dtype("float32"))
            fill = np.nan if np.issubdtype(dtype, np.floating) else -1
            arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

            for (ei, sid_str), val in self._values[var].items():
                if sid_str in sid_to_idx:
                    arr[ei, sid_to_idx[sid_str]] = val

            meta = _VAR_METADATA.get(var, {})
            data_vars[var] = (("epoch", "sid"), arr, meta)

        # --- Coordinates ---
        coords = {
            "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
            "sid": xr.DataArray(
                np.array(sorted_sids, dtype=object),
                dims=["sid"],
                attrs=COORDS_METADATA["sid"],
            ),
            "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
            "system": ("sid", system_arr, COORDS_METADATA["system"]),
            "band": ("sid", band_arr, COORDS_METADATA["band"]),
            "code": ("sid", code_arr, COORDS_METADATA["code"]),
            "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
            "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
            "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
        }

        # --- Global attributes ---
        attrs = self._reader._build_attrs()
        if extra_attrs:
            attrs.update(extra_attrs)

        ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

        # Validate before returning
        validate_dataset(ds, required_vars=keep_data_vars)

        return ds

add_epoch(timestamp)

Register an epoch timestamp. Returns epoch index.

Source code in packages/canvod-readers/src/canvod/readers/builder.py
86
87
88
89
def add_epoch(self, timestamp: datetime) -> int:
    """Register an epoch timestamp. Returns epoch index."""
    self._epochs.append(timestamp)
    return len(self._epochs) - 1

add_signal(sv, band, code)

Register a signal (idempotent). Returns validated SignalID.

Source code in packages/canvod-readers/src/canvod/readers/builder.py
91
92
93
94
95
def add_signal(self, sv: str, band: str, code: str) -> SignalID:
    """Register a signal (idempotent). Returns validated SignalID."""
    sig = SignalID(sv=sv, band=band, code=code)
    self._signals[sig.sid] = sig
    return sig

set_value(epoch_idx, signal, var, value)

Set a data value for a given epoch, signal, and variable.

Parameters

epoch_idx : int Index returned by :meth:add_epoch. signal : SignalID or str Signal identifier (SignalID or 'SV|band|code' string). var : str Variable name (e.g. 'SNR', 'Pseudorange', 'Phase'). value : float The observation value.

Source code in packages/canvod-readers/src/canvod/readers/builder.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def set_value(
    self,
    epoch_idx: int,
    signal: SignalID | str,
    var: str,
    value: float,
) -> None:
    """Set a data value for a given epoch, signal, and variable.

    Parameters
    ----------
    epoch_idx : int
        Index returned by :meth:`add_epoch`.
    signal : SignalID or str
        Signal identifier (SignalID or 'SV|band|code' string).
    var : str
        Variable name (e.g. 'SNR', 'Pseudorange', 'Phase').
    value : float
        The observation value.
    """
    sid = str(signal)
    if var not in self._values:
        self._values[var] = {}
    self._values[var][(epoch_idx, sid)] = value

build(keep_data_vars=None, extra_attrs=None)

Build, validate, and return the Dataset.

  1. Sorts signals alphabetically
  2. Resolves frequencies from band names via SignalIDMapper
  3. Constructs coordinate arrays with correct dtypes (float32 for freq)
  4. Attaches CF-compliant metadata from COORDS_METADATA
  5. Calls validate_dataset() before returning
Parameters

keep_data_vars : list of str, optional If provided, only include these data variables. If None, includes all variables that had values set. extra_attrs : dict, optional Additional global attributes to merge into the Dataset.

Returns

xr.Dataset Validated Dataset with dimensions (epoch, sid).

Source code in packages/canvod-readers/src/canvod/readers/builder.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
def build(
    self,
    keep_data_vars: list[str] | None = None,
    extra_attrs: dict[str, str] | None = None,
) -> xr.Dataset:
    """Build, validate, and return the Dataset.

    1. Sorts signals alphabetically
    2. Resolves frequencies from band names via SignalIDMapper
    3. Constructs coordinate arrays with correct dtypes (float32 for freq)
    4. Attaches CF-compliant metadata from COORDS_METADATA
    5. Calls validate_dataset() before returning

    Parameters
    ----------
    keep_data_vars : list of str, optional
        If provided, only include these data variables.  If ``None``,
        includes all variables that had values set.
    extra_attrs : dict, optional
        Additional global attributes to merge into the Dataset.

    Returns
    -------
    xr.Dataset
        Validated Dataset with dimensions ``(epoch, sid)``.
    """
    sorted_sids = sorted(self._signals)
    sid_to_idx = {sid: i for i, sid in enumerate(sorted_sids)}
    n_epochs = len(self._epochs)
    n_sids = len(sorted_sids)

    # --- Coordinate arrays ---
    epoch_arr = [
        np.datetime64(ts.replace(tzinfo=None) if ts.tzinfo else ts, "ns")
        for ts in self._epochs
    ]
    sv_arr = np.array([self._signals[s].sv for s in sorted_sids], dtype=object)
    system_arr = np.array(
        [self._signals[s].system for s in sorted_sids], dtype=object
    )
    band_arr = np.array([self._signals[s].band for s in sorted_sids], dtype=object)
    code_arr = np.array([self._signals[s].code for s in sorted_sids], dtype=object)

    # Frequency resolution via SignalIDMapper
    freq_center = np.array(
        [
            self._mapper.get_band_frequency(self._signals[s].band) or np.nan
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    bandwidths = np.array(
        [
            self._mapper.get_band_bandwidth(self._signals[s].band) or 0.0
            for s in sorted_sids
        ],
        dtype=np.float32,
    )
    freq_min = (freq_center - bandwidths / 2).astype(np.float32)
    freq_max = (freq_center + bandwidths / 2).astype(np.float32)

    # --- Determine which variables to include ---
    all_vars = set(self._values.keys())
    if keep_data_vars is not None:
        vars_to_build = [v for v in keep_data_vars if v in all_vars]
    else:
        vars_to_build = sorted(all_vars)

    # --- Data variable arrays ---
    data_vars: dict[str, tuple] = {}
    for var in vars_to_build:
        dtype = DTYPES.get(var, np.dtype("float32"))
        fill = np.nan if np.issubdtype(dtype, np.floating) else -1
        arr = np.full((n_epochs, n_sids), fill, dtype=dtype)

        for (ei, sid_str), val in self._values[var].items():
            if sid_str in sid_to_idx:
                arr[ei, sid_to_idx[sid_str]] = val

        meta = _VAR_METADATA.get(var, {})
        data_vars[var] = (("epoch", "sid"), arr, meta)

    # --- Coordinates ---
    coords = {
        "epoch": ("epoch", epoch_arr, COORDS_METADATA["epoch"]),
        "sid": xr.DataArray(
            np.array(sorted_sids, dtype=object),
            dims=["sid"],
            attrs=COORDS_METADATA["sid"],
        ),
        "sv": ("sid", sv_arr, COORDS_METADATA["sv"]),
        "system": ("sid", system_arr, COORDS_METADATA["system"]),
        "band": ("sid", band_arr, COORDS_METADATA["band"]),
        "code": ("sid", code_arr, COORDS_METADATA["code"]),
        "freq_center": ("sid", freq_center, COORDS_METADATA["freq_center"]),
        "freq_min": ("sid", freq_min, COORDS_METADATA["freq_min"]),
        "freq_max": ("sid", freq_max, COORDS_METADATA["freq_max"]),
    }

    # --- Global attributes ---
    attrs = self._reader._build_attrs()
    if extra_attrs:
        attrs.update(extra_attrs)

    ds = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

    # Validate before returning
    validate_dataset(ds, required_vars=keep_data_vars)

    return ds

GNSS Specifications

GNSS specifications and core characteristics.

This module contains fundamental GNSS definitions including: - Constants: Unit registry, physical constants, RINEX parameters - Exceptions: GNSS-specific error types - Metadata: CF-compliant metadata for coordinates and observables - Models: Pydantic validation models for RINEX data structures - Signals: Signal ID mapping and band properties - Utils: File hashing, version extraction, data type checks

These components are used across all GNSS reader implementations.

Directory Matching

Directory matching for RINEX data files.

Identifies and matches RINEX data directories across dates and receivers.

DataDirMatcher

Match RINEX data directories for canopy and reference receivers.

Scans a root directory structure to find dates with RINEX files present in both canopy and reference receiver directories.

Parameters

root : Path Root directory containing receiver subdirectories reference_pattern : Path, optional Relative path pattern for reference receiver (default: "01_reference/01_GNSS/01_raw") canopy_pattern : Path, optional Relative path pattern for canopy receiver (default: "02_canopy/01_GNSS/01_raw")

Examples

from pathlib import Path matcher = DataDirMatcher( ... root=Path("/data/01_Rosalia"), ... reference_pattern=Path("01_reference/01_GNSS/01_raw"), ... canopy_pattern=Path("02_canopy/01_GNSS/01_raw") ... )

Iterate over matched directories

for matched_dirs in matcher: ... print(matched_dirs.yyyydoy) ... rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o")) ... print(f" Found {len(rinex_files)} RINEX files")

Get list of common dates

dates = matcher.get_common_dates() print(f"Found {len(dates)} dates with data")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
class DataDirMatcher:
    """Match RINEX data directories for canopy and reference receivers.

    Scans a root directory structure to find dates with RINEX files
    present in both canopy and reference receiver directories.

    Parameters
    ----------
    root : Path
        Root directory containing receiver subdirectories
    reference_pattern : Path, optional
        Relative path pattern for reference receiver
        (default: "01_reference/01_GNSS/01_raw")
    canopy_pattern : Path, optional
        Relative path pattern for canopy receiver
        (default: "02_canopy/01_GNSS/01_raw")

    Examples
    --------
    >>> from pathlib import Path
    >>> matcher = DataDirMatcher(
    ...     root=Path("/data/01_Rosalia"),
    ...     reference_pattern=Path("01_reference/01_GNSS/01_raw"),
    ...     canopy_pattern=Path("02_canopy/01_GNSS/01_raw")
    ... )
    >>>
    >>> # Iterate over matched directories
    >>> for matched_dirs in matcher:
    ...     print(matched_dirs.yyyydoy)
    ...     rinex_files = list(matched_dirs.canopy_data_dir.glob("*.25o"))
    ...     print(f"  Found {len(rinex_files)} RINEX files")

    >>> # Get list of common dates
    >>> dates = matcher.get_common_dates()
    >>> print(f"Found {len(dates)} dates with data")

    """

    def __init__(
        self,
        root: Path,
        reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
        canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
    ) -> None:
        """Initialize matcher with directory structure."""
        import warnings

        warnings.warn(
            "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.root = Path(root)
        self.reference_dir = self.root / reference_pattern
        self.canopy_dir = self.root / canopy_pattern

        # Validate directories exist
        self._validate_directory(self.root, "Root")
        self._validate_directory(self.reference_dir, "Reference")
        self._validate_directory(self.canopy_dir, "Canopy")

    def __iter__(self) -> Iterator[MatchedDirs]:
        """Iterate over matched directory pairs with RINEX files.

        Yields
        ------
        MatchedDirs
            Matched directories for each date.

        """
        for date_str in self.get_common_dates():
            yield MatchedDirs(
                canopy_data_dir=self.canopy_dir / date_str,
                reference_data_dir=self.reference_dir / date_str,
                yyyydoy=YYYYDOY.from_yydoy_str(date_str),
            )

    def get_common_dates(self) -> list[str]:
        """Get dates with RINEX files in both receivers.

        Uses parallel processing to check directories efficiently.

        Returns
        -------
        list[str]
            Sorted list of date strings (YYDDD format, e.g., "25001")
            that have RINEX files in both canopy and reference directories.

        """
        # Find dates with RINEX in each receiver
        ref_dates = self._get_dates_with_rinex(self.reference_dir)
        can_dates = self._get_dates_with_rinex(self.canopy_dir)

        # Find intersection
        common = ref_dates & can_dates
        common.discard("00000")  # Remove placeholder directories

        # Sort naturally (numerical order)
        return natsorted(common)

    def _get_dates_with_rinex(self, base_dir: Path) -> set[str]:
        """Find all date directories containing RINEX files.

        Uses parallel processing to check multiple directories at once.

        Parameters
        ----------
        base_dir : Path
            Base directory to search (e.g., canopy or reference root).

        Returns
        -------
        set[str]
            Set of date directory names that contain RINEX files.

        """
        # Get all subdirectories
        date_dirs = (d for d in base_dir.iterdir() if d.is_dir())

        # Check for RINEX files in parallel
        dates_with_rinex = set()

        with ThreadPoolExecutor() as executor:
            future_to_dir = {
                executor.submit(self._has_rinex_files, d): d for d in date_dirs
            }

            for future in as_completed(future_to_dir):
                directory = future_to_dir[future]
                if future.result():
                    dates_with_rinex.add(directory.name)

        return dates_with_rinex

    @staticmethod
    def _has_rinex_files(directory: Path) -> bool:
        """Check if directory contains RINEX observation files.

        Parameters
        ----------
        directory : Path
            Directory to check.

        Returns
        -------
        bool
            True if RINEX files found.

        """
        return _has_rinex_files(directory)

    def _validate_directory(self, path: Path, name: str) -> None:
        """Validate directory exists.

        Parameters
        ----------
        path : Path
            Directory to check.
        name : str
            Name for error message.

        Raises
        ------
        FileNotFoundError
            If directory doesn't exist.

        """
        if not path.exists():
            msg = f"{name} directory not found: {path}"
            raise FileNotFoundError(msg)

__init__(root, reference_pattern=Path('01_reference/01_GNSS/01_raw'), canopy_pattern=Path('02_canopy/01_GNSS/01_raw'))

Initialize matcher with directory structure.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def __init__(
    self,
    root: Path,
    reference_pattern: Path = Path("01_reference/01_GNSS/01_raw"),
    canopy_pattern: Path = Path("02_canopy/01_GNSS/01_raw"),
) -> None:
    """Initialize matcher with directory structure."""
    import warnings

    warnings.warn(
        "DataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.root = Path(root)
    self.reference_dir = self.root / reference_pattern
    self.canopy_dir = self.root / canopy_pattern

    # Validate directories exist
    self._validate_directory(self.root, "Root")
    self._validate_directory(self.reference_dir, "Reference")
    self._validate_directory(self.canopy_dir, "Canopy")

__iter__()

Iterate over matched directory pairs with RINEX files.

Yields

MatchedDirs Matched directories for each date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def __iter__(self) -> Iterator[MatchedDirs]:
    """Iterate over matched directory pairs with RINEX files.

    Yields
    ------
    MatchedDirs
        Matched directories for each date.

    """
    for date_str in self.get_common_dates():
        yield MatchedDirs(
            canopy_data_dir=self.canopy_dir / date_str,
            reference_data_dir=self.reference_dir / date_str,
            yyyydoy=YYYYDOY.from_yydoy_str(date_str),
        )

get_common_dates()

Get dates with RINEX files in both receivers.

Uses parallel processing to check directories efficiently.

Returns

list[str] Sorted list of date strings (YYDDD format, e.g., "25001") that have RINEX files in both canopy and reference directories.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def get_common_dates(self) -> list[str]:
    """Get dates with RINEX files in both receivers.

    Uses parallel processing to check directories efficiently.

    Returns
    -------
    list[str]
        Sorted list of date strings (YYDDD format, e.g., "25001")
        that have RINEX files in both canopy and reference directories.

    """
    # Find dates with RINEX in each receiver
    ref_dates = self._get_dates_with_rinex(self.reference_dir)
    can_dates = self._get_dates_with_rinex(self.canopy_dir)

    # Find intersection
    common = ref_dates & can_dates
    common.discard("00000")  # Remove placeholder directories

    # Sort naturally (numerical order)
    return natsorted(common)

PairDataDirMatcher

Match RINEX directories for receiver pairs across dates.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site. Requires a configuration dict specifying receiver locations and analysis pairs.

Parameters

base_dir : Path Root directory containing all receiver data receivers : dict Receiver configuration mapping receiver names to their directory paths. The directory value is the full relative path from base_dir to the raw RINEX data directory (before the {YYDOY} date folders). Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"}, "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}} analysis_pairs : dict Analysis pair configuration specifying which receivers to match Example: {"pair_01": {"canopy_receiver": "canopy_01", "reference_receiver": "reference_01"}}

Examples

receivers = { ... "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"}, ... "reference_01": {"directory": "01_reference/01_GNSS/01_raw"} ... } pairs = { ... "main_pair": { ... "canopy_receiver": "canopy_01", ... "reference_receiver": "reference_01" ... } ... }

matcher = PairDataDirMatcher( ... base_dir=Path("/data/01_Rosalia"), ... receivers=receivers, ... analysis_pairs=pairs ... )

for matched in matcher: ... print(f"{matched.yyyydoy}: {matched.pair_name}") ... print(f" Canopy: {matched.canopy_data_dir}") ... print(f" Reference: {matched.reference_data_dir}")

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
class PairDataDirMatcher:
    """Match RINEX directories for receiver pairs across dates.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site. Requires a configuration dict
    specifying receiver locations and analysis pairs.

    Parameters
    ----------
    base_dir : Path
        Root directory containing all receiver data
    receivers : dict
        Receiver configuration mapping receiver names to their directory paths.
        The ``directory`` value is the full relative path from ``base_dir`` to the
        raw RINEX data directory (before the ``{YYDOY}`` date folders).
        Example: {"canopy_01": {"directory": "02_canopy_01/01_GNSS/01_raw"},
                  "reference_01": {"directory": "01_reference_01/01_GNSS/01_raw"}}
    analysis_pairs : dict
        Analysis pair configuration specifying which receivers to match
        Example: {"pair_01": {"canopy_receiver": "canopy_01",
                               "reference_receiver": "reference_01"}}

    Examples
    --------
    >>> receivers = {
    ...     "canopy_01": {"directory": "02_canopy/01_GNSS/01_raw"},
    ...     "reference_01": {"directory": "01_reference/01_GNSS/01_raw"}
    ... }
    >>> pairs = {
    ...     "main_pair": {
    ...         "canopy_receiver": "canopy_01",
    ...         "reference_receiver": "reference_01"
    ...     }
    ... }
    >>>
    >>> matcher = PairDataDirMatcher(
    ...     base_dir=Path("/data/01_Rosalia"),
    ...     receivers=receivers,
    ...     analysis_pairs=pairs
    ... )
    >>>
    >>> for matched in matcher:
    ...     print(f"{matched.yyyydoy}: {matched.pair_name}")
    ...     print(f"  Canopy: {matched.canopy_data_dir}")
    ...     print(f"  Reference: {matched.reference_data_dir}")

    """

    def __init__(
        self,
        base_dir: Path,
        receivers: dict[str, dict[str, str]],
        analysis_pairs: dict[str, dict[str, str]],
    ) -> None:
        """Initialize pair matcher with receiver configuration."""
        import warnings

        warnings.warn(
            "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
            "with DataDirectoryValidator instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        self.base_dir = Path(base_dir)
        self.receivers = receivers
        self.analysis_pairs = analysis_pairs

        # Validate receivers have directory config
        self.receiver_dirs = self._build_receiver_dir_mapping()

    def _build_receiver_dir_mapping(self) -> dict[str, str]:
        """Map receiver names to their directory prefixes.

        Returns
        -------
        dict[str, str]
            Mapping of receiver name to directory path.

        Raises
        ------
        ValueError
            If receiver missing 'directory' in config.

        """
        mapping = {}
        for receiver_name, config in self.receivers.items():
            if "directory" not in config:
                msg = f"Receiver '{receiver_name}' missing 'directory' in config"
                raise ValueError(msg)
            mapping[receiver_name] = config["directory"]
        return mapping

    def _get_receiver_path(self, receiver_name: str, yyyydoy: YYYYDOY) -> Path:
        """Build full path to receiver data for a specific date.

        Parameters
        ----------
        receiver_name : str
            Receiver name (e.g., "canopy_01").
        yyyydoy : YYYYDOY
            Date object.

        Returns
        -------
        Path
            Full path to receiver's RINEX directory for the date.

        """
        receiver_dir = self.receiver_dirs[receiver_name]

        # Convert YYYYDDD to YYDDD format for directory name
        yyddd_str = yyyydoy.yydoy
        if yyddd_str is None:
            msg = f"Missing YYDDD representation for date {yyyydoy}"
            raise ValueError(msg)

        return self.base_dir / receiver_dir / yyddd_str

    def _get_all_dates(self) -> set[YYYYDOY]:
        """Find all dates that have data in any receiver directory.

        Returns
        -------
        set[YYYYDOY]
            Set of all dates with available data.

        """
        all_dates = set()

        for receiver_name in self.receivers:
            receiver_dir = self.receiver_dirs[receiver_name]
            receiver_base = self.base_dir / receiver_dir

            if not receiver_base.exists():
                continue

            # Find all date directories (format: YYDDD - 5 digits)
            for date_dir in receiver_base.iterdir():
                if not date_dir.is_dir():
                    continue

                # Check if directory name is 5 digits
                if len(date_dir.name) != DATE_DIR_LEN or not date_dir.name.isdigit():
                    continue

                # Skip placeholder directories
                if date_dir.name == "00000":
                    continue

                try:
                    yyyydoy = YYYYDOY.from_yydoy_str(date_dir.name)
                    all_dates.add(yyyydoy)
                except ValueError:
                    continue

        return all_dates

    def __iter__(self) -> Iterator[PairMatchedDirs]:
        """Iterate over all date/pair combinations with available data.

        Yields
        ------
        PairMatchedDirs
            Matched directories for a receiver pair on a specific date.

        """
        all_dates = sorted(self._get_all_dates())

        for yyyydoy in all_dates:
            # For each configured analysis pair
            for pair_name, pair_config in self.analysis_pairs.items():
                canopy_rx = pair_config["canopy_receiver"]
                reference_rx = pair_config["reference_receiver"]

                # Build paths for this pair
                canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
                reference_path = self._get_receiver_path(reference_rx, yyyydoy)

                # Check for RINEX files
                canopy_has_files = _has_rinex_files(canopy_path)
                reference_has_files = _has_rinex_files(reference_path)

                # Only yield if both directories exist and have data
                if canopy_has_files and reference_has_files:
                    yield PairMatchedDirs(
                        yyyydoy=yyyydoy,
                        pair_name=pair_name,
                        canopy_receiver=canopy_rx,
                        reference_receiver=reference_rx,
                        canopy_data_dir=canopy_path,
                        reference_data_dir=reference_path,
                    )

__init__(base_dir, receivers, analysis_pairs)

Initialize pair matcher with receiver configuration.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def __init__(
    self,
    base_dir: Path,
    receivers: dict[str, dict[str, str]],
    analysis_pairs: dict[str, dict[str, str]],
) -> None:
    """Initialize pair matcher with receiver configuration."""
    import warnings

    warnings.warn(
        "PairDataDirMatcher is deprecated. Use canvod.virtualiconvname.FilenameMapper "
        "with DataDirectoryValidator instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    self.base_dir = Path(base_dir)
    self.receivers = receivers
    self.analysis_pairs = analysis_pairs

    # Validate receivers have directory config
    self.receiver_dirs = self._build_receiver_dir_mapping()

__iter__()

Iterate over all date/pair combinations with available data.

Yields

PairMatchedDirs Matched directories for a receiver pair on a specific date.

Source code in packages/canvod-readers/src/canvod/readers/matching/dir_matcher.py
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
def __iter__(self) -> Iterator[PairMatchedDirs]:
    """Iterate over all date/pair combinations with available data.

    Yields
    ------
    PairMatchedDirs
        Matched directories for a receiver pair on a specific date.

    """
    all_dates = sorted(self._get_all_dates())

    for yyyydoy in all_dates:
        # For each configured analysis pair
        for pair_name, pair_config in self.analysis_pairs.items():
            canopy_rx = pair_config["canopy_receiver"]
            reference_rx = pair_config["reference_receiver"]

            # Build paths for this pair
            canopy_path = self._get_receiver_path(canopy_rx, yyyydoy)
            reference_path = self._get_receiver_path(reference_rx, yyyydoy)

            # Check for RINEX files
            canopy_has_files = _has_rinex_files(canopy_path)
            reference_has_files = _has_rinex_files(reference_path)

            # Only yield if both directories exist and have data
            if canopy_has_files and reference_has_files:
                yield PairMatchedDirs(
                    yyyydoy=yyyydoy,
                    pair_name=pair_name,
                    canopy_receiver=canopy_rx,
                    reference_receiver=reference_rx,
                    canopy_data_dir=canopy_path,
                    reference_data_dir=reference_path,
                )

MatchedDirs dataclass

Matched directory paths for canopy and reference receivers.

Immutable container representing a pair of directories containing RINEX data for the same date.

Parameters

canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference (open-sky) receiver RINEX directory. yyyydoy : YYYYDOY Date object for this matched pair.

Examples

from pathlib import Path from canvod.utils.tools import YYYYDOY

md = MatchedDirs( ... canopy_data_dir=Path("/data/02_canopy/25001"), ... reference_data_dir=Path("/data/01_reference/25001"), ... yyyydoy=YYYYDOY.from_str("2025001") ... ) md.yyyydoy.to_str() '2025001'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
@dataclass(frozen=True)
class MatchedDirs:
    """Matched directory paths for canopy and reference receivers.

    Immutable container representing a pair of directories containing
    RINEX data for the same date.

    Parameters
    ----------
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference (open-sky) receiver RINEX directory.
    yyyydoy : YYYYDOY
        Date object for this matched pair.

    Examples
    --------
    >>> from pathlib import Path
    >>> from canvod.utils.tools import YYYYDOY
    >>>
    >>> md = MatchedDirs(
    ...     canopy_data_dir=Path("/data/02_canopy/25001"),
    ...     reference_data_dir=Path("/data/01_reference/25001"),
    ...     yyyydoy=YYYYDOY.from_str("2025001")
    ... )
    >>> md.yyyydoy.to_str()
    '2025001'

    """

    canopy_data_dir: Path
    reference_data_dir: Path
    yyyydoy: YYYYDOY

PairMatchedDirs dataclass

Matched directories for a receiver pair on a specific date.

Supports multi-receiver configurations where multiple canopy/reference pairs may exist at the same site.

Parameters

yyyydoy : YYYYDOY Date for this matched pair. pair_name : str Identifier for this receiver pair (e.g., "pair_01"). canopy_receiver : str Name of canopy receiver (e.g., "canopy_01"). reference_receiver : str Name of reference receiver (e.g., "reference_01"). canopy_data_dir : Path Path to canopy receiver RINEX directory. reference_data_dir : Path Path to reference receiver RINEX directory.

Examples

pmd = PairMatchedDirs( ... yyyydoy=YYYYDOY.from_str("2025001"), ... pair_name="pair_01", ... canopy_receiver="canopy_01", ... reference_receiver="reference_01", ... canopy_data_dir=Path("/data/canopy_01/25001"), ... reference_data_dir=Path("/data/reference_01/25001") ... ) pmd.pair_name 'pair_01'

Source code in packages/canvod-readers/src/canvod/readers/matching/models.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@dataclass
class PairMatchedDirs:
    """Matched directories for a receiver pair on a specific date.

    Supports multi-receiver configurations where multiple canopy/reference
    pairs may exist at the same site.

    Parameters
    ----------
    yyyydoy : YYYYDOY
        Date for this matched pair.
    pair_name : str
        Identifier for this receiver pair (e.g., "pair_01").
    canopy_receiver : str
        Name of canopy receiver (e.g., "canopy_01").
    reference_receiver : str
        Name of reference receiver (e.g., "reference_01").
    canopy_data_dir : Path
        Path to canopy receiver RINEX directory.
    reference_data_dir : Path
        Path to reference receiver RINEX directory.

    Examples
    --------
    >>> pmd = PairMatchedDirs(
    ...     yyyydoy=YYYYDOY.from_str("2025001"),
    ...     pair_name="pair_01",
    ...     canopy_receiver="canopy_01",
    ...     reference_receiver="reference_01",
    ...     canopy_data_dir=Path("/data/canopy_01/25001"),
    ...     reference_data_dir=Path("/data/reference_01/25001")
    ... )
    >>> pmd.pair_name
    'pair_01'

    """

    yyyydoy: YYYYDOY
    pair_name: str
    canopy_receiver: str
    reference_receiver: str
    canopy_data_dir: Path
    reference_data_dir: Path