-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Problem
Age weighting exists, but scores are not normalized across time, making historical comparisons unreliable.
Basis of issue
- Temporal normalization layer across cohorts
- Score decay or relevance adjustment over time
- Cross-cohort score alignment
- Separation of raw vs normalized scores
Importance
- Paper criterion Prevent validators from censoring prompts after the fact. #6: longitudinal normalized evaluation
- Model capabilities improve over time
- Raw scores become misleading without normalization
Current implementation gap
- Linear / exponential age weighting only
- No true temporal normalization
Implementation checklist
- Normalized score metric independent of evaluation date
- Cohort-aware score calibration
- Historical scores remain comparable
- Clear distinction between raw and normalized scores
coderabbitai
Metadata
Metadata
Assignees
Labels
No labels