766 B
766 B
Ideas & Hypotheses
Transformer as backbone (Plan B)
- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
- Risk: higher compute cost, potentially more unstable training.
- Status: implemented as
backbone_type = "transformer"in config. - Experiment: compare GRU vs Transformer using
run_compare.py.
Residual standardization
- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
Two-stage training with curriculum
- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.
Discrete calibration
- Hypothesis: post-hoc calibration on discrete marginals can reduce JSD without harming KS.