transformer
This commit is contained in:
13
docs/ideas.md
Normal file
13
docs/ideas.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Ideas & Hypotheses
|
||||
|
||||
## Transformer as backbone (Plan B)
|
||||
- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
|
||||
- Risk: higher compute cost, potentially more unstable training.
|
||||
- Status: implemented as `backbone_type = "transformer"` in config.
|
||||
- Experiment: compare GRU vs Transformer using `run_compare.py`.
|
||||
|
||||
## Residual standardization
|
||||
- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
|
||||
|
||||
## Two-stage training with curriculum
|
||||
- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.
|
||||
Reference in New Issue
Block a user