transformer

This commit is contained in:
2026-01-27 00:41:42 +08:00
parent 65391910a2
commit 334db7082b
12 changed files with 175 additions and 11 deletions

13
docs/ideas.md Normal file
View File

@@ -0,0 +1,13 @@
# Ideas & Hypotheses
## Transformer as backbone (Plan B)
- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
- Risk: higher compute cost, potentially more unstable training.
- Status: implemented as `backbone_type = "transformer"` in config.
- Experiment: compare GRU vs Transformer using `run_compare.py`.
## Residual standardization
- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
## Two-stage training with curriculum
- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.