transformer

2026-01-27 00:41:42 +08:00
parent 65391910a2
commit 334db7082b
12 changed files with 175 additions and 11 deletions
--- a/docs/ideas.md
+++ b/docs/ideas.md
@@ -0,0 +1,13 @@
+# Ideas & Hypotheses
+
+## Transformer as backbone (Plan B)
+- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
+- Risk: higher compute cost, potentially more unstable training.
+- Status: implemented as `backbone_type = "transformer"` in config.
+- Experiment: compare GRU vs Transformer using `run_compare.py`.
+
+## Residual standardization
+- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
+
+## Two-stage training with curriculum
+- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.