This repo contains multiple time series forecasting methods for water quality analysis.
Built on the baselines, we propose a spatially informed, clustering-based masking strategy during self-supervised pretraining to explicitly incorporate spatial relationships among water sensors. Specifically, the approach groups geographically nearby sensors and masks & reconstructs entire spatial clusters during pretraining, forcing the model to learn long-range dependencies.
module load python3
source activate multimae
export PYTHONPATH=$PWD:$PYTHONPATHpython run_unified_experiments.py- Models: LSTM, STD-MAE Random, STD-MAE Distance
python run_unified_experiments.py --quick- Duration: ~30-60 minutes (10 pretrain + 5 downstream epochs)
- Purpose: Test pipeline functionality
# Run only LSTM and Random masking
python run_unified_experiments.py --models lstm,random
# Run only Distance masking
python run_unified_experiments.py --models distanceexperiments/unified_comparison_[name]_[timestamp]/
├── hyperparameters.json # All hyperparameters used
├── detailed_results.json # Complete experimental results
├── comparison_summary.txt # Human-readable summary
├── lstm_results/ # LSTM outputs
│ ├── mae_matrix.csv
│ ├── r2_matrix.csv
│ ├── training_losses.csv
│ ├── heatmaps.png
│ └── forecast_vs_truth_grid.png
├── stdmae_random_results/ # Random masking outputs
│ └── [visualization files]
└── stdmae_distance_results/ # Distance masking outputs
└── [visualization files]
- LSTM: MAE, R², MSE (per-sensor and overall)
- STD-MAE: Test MSE from downstream evaluation (we adapt STD-MAE from link)
- Visualizations: Heatmaps, forecast comparisons, training curves
pred_len = 24 # Prediction sequence length
hidden = 128
distance_threshold_km = 75 # Distance masking threshold