Implement-median-model-v4 #793

JanMaartenvanDoorn · 2025-12-19T09:52:55Z

Changes proposed in this PR include:

Added Median model to openSTEF V4.0
Checked behavior against openSTEF v3.x and matches
Implements [OpenSTEF 4.0] Implement Median Model #652

egordm

Great changes. I have only a few nipicks about test and possibly moving some logic to timeseries dataset.

packages/openstef-models/tests/unit/models/forecasting/test_median_forecaster.py

egordm · 2026-01-08T09:48:27Z

packages/openstef-models/src/openstef_models/models/forecasting/median_forecaster.py

+    def _infer_frequency(index: pd.DatetimeIndex) -> pd.Timedelta:
+        """Infer the frequency of a pandas DatetimeIndex if the freq attribute is not set.
+
+        This method calculates the most common time difference between consecutive timestamps,
+        which is more permissive of missing chunks of data than the pandas infer_freq method.
+
+        Args:
+            index (pd.DatetimeIndex): The datetime index to infer the frequency from.
+
+        Returns:
+            pd.Timedelta: The inferred frequency as a pandas Timedelta.
+
+        Raises:
+            ValueError: If the index has fewer than 2 timestamps.
+        """
+        minimum_required_length = 2
+        if len(index) < minimum_required_length:
+            raise ValueError("Cannot infer frequency from an index with fewer than 2 timestamps.")
+
+        # Calculate the differences between consecutive timestamps
+        deltas = index.to_series().diff().dropna()
+
+        # Find the most common difference
+        return deltas.mode().iloc[0]
+
+    def _frequency_matches(self, index: pd.DatetimeIndex) -> bool:
+        """Check if the frequency of the input data matches the model frequency.
+
+        Args:
+            index (pd.DatetimeIndex): The input data to check.
+
+        Returns:
+            bool: True if the frequencies match, False otherwise.
+        """
+        input_frequency = self._infer_frequency(index) if index.freq is None else index.freq
+        return input_frequency == self.frequency


Maybe it would be nice to move this to TimeSeriesDataset. To have something like one function called validate_sample_interval that checks the data against the set sample interval. If user wants to be sure they can call it.

It would make it easier to test and median model code would be a lot simpler.

…cordingly

egordm

Looks great. The only change I would suggest is to avoid actually inferring timedelta, it's something user can better give manually given how many edge cases there are for this. We can only do verification. This way we also don't need any changes in stef beam and such.

egordm · 2026-01-15T14:00:11Z

packages/openstef-core/src/openstef_core/datasets/timeseries_dataset.py

+            inferred_freq = pd.Timedelta(
+                self._infer_frequency(data.index) if data.index.freq is None else data.index.freq  # type: ignore
+            )
+            sample_interval = inferred_freq.to_pytimedelta()


Oh maybe we shouldn't infer frequency from the input data? Forcing user to specify it would be the best. The data may contain holes or similar throwing off the inference.

We should probably only have a validate function that optionally validates if the data uses the right sample interval for functions that are sensitive to this. Like the median model.

It probably also solves issues you have been having with failing doctests.

egordm · 2026-01-15T14:07:00Z

packages/openstef-models/src/openstef_models/models/forecasting/median_forecaster.py

+        lag_deltas = sorted(self.lags_to_time_deltas_.values())
+        lag_intervals = [(lag_deltas[i] - lag_deltas[i - 1]).total_seconds() for i in range(1, len(lag_deltas))]
+        if not all(interval == lag_intervals[0] for interval in lag_intervals):
+            msg = (
+                "Lag features are not evenly spaced. "
+                "Please ensure lag features are evenly spaced and match the data frequency."
+            )
+            raise ValueError(msg)


Usually I like splitting chunks like this off into generic standalone functions cause they are easy to test in isolation but harder to test in model.

For example here we could have a function check_timedeltas_evenly_spaced which is easy to test and doesn't require extra test for the model.

This is more of a person preference. We if it's not worth it we can skip it here.

egordm · 2026-01-15T14:08:17Z

packages/openstef-models/src/openstef_models/models/forecasting/median_forecaster.py

+
+        # Check that lag frequency matches data frequency
+        expected_lag_interval = lag_intervals[0]
+        if expected_lag_interval != self.frequency_.total_seconds():


Most of the operations can be done on timedelta structure the total seconds would be unnecessary. In this case it would be a simple comparison.

JanMaartenvanDoorn and others added 6 commits December 9, 2025 09:24

wip

2be83da

wip

f3d95e9

Merge tag 'v4.0.0.a10' into implement-median-model-v4

561fdce

Add median model

e33a01e

Merge branch 'release/v4.0.0' into implement-median-model-v4

fa7abda

Fix linting

081eb4f

egordm added feature New feature or request OpenSTEF 4.0 Work for OpenSTEF 4.0 labels Dec 19, 2025

JanMaartenvanDoorn and others added 7 commits January 7, 2026 16:40

added and fixed unit tests

4dff91a

ran linting

b5ba22c

ran format

8aead57

fix linting and formatting

be94422

fix more linting and formatting

31ee823

Merge branch 'release/v4.0.0' into implement-median-model-v4

dd13683

fix type check

486c43c

JanMaartenvanDoorn requested review from egordm and ylvab January 7, 2026 16:09

egordm approved these changes Jan 8, 2026

View reviewed changes

egordm mentioned this pull request Jan 9, 2026

[OpenSTEF 4.0] Implement Median Model #652

Closed

3 tasks

JanMaartenvanDoorn and others added 4 commits January 9, 2026 13:08

Implemented comments

031c780

Merge branch 'release/v4.0.0' into implement-median-model-v4

52bb140

moved sample interval check to TimeseriesDataset and updated tests ac…

7266d8e

…cordingly

fix doc test

ef9db79

egordm requested changes Jan 15, 2026

View reviewed changes

Merge branch 'release/v4.0.0' into implement-median-model-v4

57eae8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement-median-model-v4 #793

Implement-median-model-v4 #793

Uh oh!

JanMaartenvanDoorn commented Dec 19, 2025 •

edited

Loading

Uh oh!

egordm left a comment

Uh oh!

Uh oh!

egordm Jan 8, 2026

Uh oh!

egordm left a comment

Uh oh!

egordm Jan 15, 2026

Uh oh!

egordm Jan 15, 2026

Uh oh!

egordm Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement-median-model-v4 #793

Are you sure you want to change the base?

Implement-median-model-v4 #793

Uh oh!

Conversation

JanMaartenvanDoorn commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this PR include:

Uh oh!

egordm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

egordm Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

egordm left a comment

Choose a reason for hiding this comment

Uh oh!

egordm Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

egordm Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

egordm Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JanMaartenvanDoorn commented Dec 19, 2025 •

edited

Loading