docs: RFC - Multi-Quantile Target Support for ForecastInputDataset #770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

egordm wants to merge 3 commits into release/v4.0.0 from feature/rfc-multi-quantile-target-forecast-dataset

docs/source/rfcs/0000-rfc-template.md

-Original file line number
+Diff line change
@@ -0,0 +1,78 @@
+    # RFC-XXXX: [Title]
+    - **Status**: Draft | Under Review | Accepted | Rejected | Implemented
+    - **Created**: YYYY-MM-DD
+    - **Authors**: @username
+    - **Tracking Issue**: #NNN (if applicable)
+    ## Summary
+    One paragraph explanation of the feature.
+    ## Motivation
+    Why are we doing this? What use cases does it support? What is the expected outcome?
+    Focus on explaining the motivation clearly without diving into implementation details.
+    ## Guide-level Explanation
+    Explain the proposal as if teaching it to another developer. Introduce new concepts, explain by example, and describe the impact on users.
+    This section should help readers understand:
+    - What new capabilities this enables
+    - How existing workflows change (if at all)
+    - Concrete examples of usage
+    ## Reference-level Explanation
+    Technical details of the design. This section should cover:
+    - API changes (new methods, parameters, types)
+    - Data format changes
+    - Interaction with existing components
+    - Edge cases and error handling
+    Keep code examples minimal and focused. Pseudocode is acceptable.
+    ## Drawbacks
+    Why should we *not* do this? Consider:
+    - Implementation cost
+    - Integration complexity
+    - Maintenance burden
+    - Breaking changes (if any)
+    ## Design Decisions
+    Document key design decisions using the format:
+    ### D1: [Decision title]
+    **Decision**: What was decided.
+    **Rationale**: Why this choice was made.
+    **Alternatives considered**: What other options were evaluated and why they were rejected.
+    ---
+    Repeat for each significant design decision (D2, D3, etc.)
+    ## Rationale and Alternatives
+    - Why is this design the best among alternatives?
+    - What other designs were considered and why were they rejected?
+    - What is the impact of not doing this?
+    ## Unresolved Questions
+    - What parts of the design are still TBD?
+    - What related issues are out of scope for this RFC?
+    - What questions need to be resolved during implementation?
+    ## Future Possibilities
+    How might this proposal be extended in the future? What adjacent features might build on this?
+    This section is a place to dump ideas that are related but out of scope for the current RFC.

docs/source/rfcs/0001-multi-quantile-target-forecast-input.md

-Original file line number
+Diff line change
@@ -0,0 +1,111 @@
+    # RFC-0001: Multi-Quantile Target Support for ForecastInputDataset
+    - **Status**: Under Review
+    - **Created**: 2025-11-27
+    - **Authors**: @egordm
+    - **Tracking Issue**: N/A
+    ## Summary
+    Extend `ForecastInputDataset` to support multiple target series as quantiles (e.g., P10, P50, P90). This enables training models directly on probabilistic targets from upstream forecasters.
+    ## Motivation
+    We're building a **metaforecasting module** with:
+. **Stacking Forecaster**: Meta-predictor combining quantile outputs from multiple base forecasters
+. **Residual Forecaster**: Trained on residuals of another model to improve combined forecasts
+    Both need multi-quantile targets as "ground truth" for training.
+    ### Why not train N separate models?
+    ```python
+    # Inefficient approach
+    for quantile in [0.1, 0.5, 0.9]:
+        model_q.fit(data_for_quantile_q)  # 3x training time
+    ```
+    Problems:
+    - ~3x training time, redundant feature computation
+    - No shared learning across quantiles
+    - Code complexity managing multiple models
+    XGBoost supports multi-target training natively (`multi_strategy="one_output_per_tree"`). Multi-quantile targets enable efficient joint training in a single pass.
+    ## Design
+    ### API Changes
+    ```python
+    dataset = ForecastInputDataset(data, target_column="load")
+    # New properties
+    dataset.has_quantile_targets  # bool: True if multi-quantile
+    dataset.target_quantiles      # list[Quantile]: e.g., [Q(0.1), Q(0.5), Q(0.9)]
+    dataset.target_quantiles_data # DataFrame with all quantile columns
+    dataset.primary_target_series # P50 if multi-quantile, else single target
+    # Backward compatible
+    dataset.target_series         # Alias for primary_target_series
+    ```
+    ### Column Naming
+    Pattern: `{target_column}_{quantile.format()}`
+    | Column | Meaning |
+    |--------|---------|
+    | `load` | Single target (legacy) |
+    | `load_quantile_P10` | 10th percentile target |
+    | `load_quantile_P50` | Median target |
+    ### Detection
+. If `target_quantiles` param provided → use those
+. Else auto-detect columns matching `{target_column}_quantile_P*`
+. Else → single-target mode
+    ### Validation
+    - Multi-quantile mode requires P50 (for `target_series` compatibility)
+    - All declared quantiles must have corresponding columns
+    ## Design Decisions
+    ### D1: Sample weights use primary target
+    Sample weights based on `primary_target_series`. Custom forecasters can override via `has_quantile_targets`.
+    ### D2: Forecasters must validate support
+    Forecasters raise `InputValidationError` if they receive multi-quantile targets but don't support them. Fail-fast prevents silent bugs.
+    ### D3: Evaluation uses primary target only
+    Always use `primary_target_series` (P50 for multi-quantile) as ground truth. Quantile-to-quantile metrics deferred to future work.
+    ### D4: "Primary" not "median" terminology
+    Single-target datasets aren't necessarily medians (could be mean). `primary_target_series` avoids implying statistical meaning.
+    ### D5: Target quantiles must match forecaster quantiles
+    If forecaster supports multi-quantile targets, `data.target_quantiles` must exactly match `forecaster.config.quantiles`. Prevents undefined training behavior.
+    ### D6: No target quantiles in ForecastDataset
+    `ForecastDataset` (output) does not gain target quantile support. No current use case for quantile targets in evaluation or post-prediction workflows. Can be added later if needed.
+    ## Drawbacks
+    - Adds conditional logic to dataset class
+    - New column naming convention to document
+    - Forecasters need updates to leverage feature
+    ## Alternatives Considered
+    | Alternative | Why Rejected |
+    |-------------|--------------|
+    | Separate `MultiQuantileTargetDataset` class | Union types everywhere, code duplication |
+    | Use `quantile_P*` naming (like ForecastDataset) | Conflicts with forecast output columns |
+    ## Future Possibilities
+    - Multi-output training on all quantile targets simultaneously
+    - Quantile-to-quantile evaluation metrics
+    - Per-quantile scalers and sample weights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: RFC - Multi-Quantile Target Support for ForecastInputDataset #770

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

docs: RFC - Multi-Quantile Target Support for ForecastInputDataset #770

Are you sure you want to change the base?

Uh oh!

docs: RFC - Multi-Quantile Target Support for ForecastInputDataset #770

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!