transformer

2026-01-27 00:41:42 +08:00
parent 65391910a2
commit 334db7082b
12 changed files with 175 additions and 11 deletions
--- a/docs/decisions.md
+++ b/docs/decisions.md
@@ -0,0 +1,35 @@
+# Design & Decision Log
+
+## 2026-01-26 — Two-stage temporal backbone (GRU) + residual diffusion
+- **Decision**: Add a stage-1 GRU trend model, then train diffusion on residuals.
+- **Why**: Separate temporal consistency from distribution alignment.
+- **Files**:
+  - `example/hybrid_diffusion.py` (added `TemporalGRUGenerator`)
+  - `example/train.py` (two-stage training + residual diffusion)
+  - `example/sample.py`, `example/export_samples.py` (trend + residual synthesis)
+  - `example/config.json` (temporal hyperparameters)
+- **Expected effect**: improve lag-1 consistency; may hurt KS if residual distribution drifts.
+
+## 2026-01-26 — Residual distribution alignment losses
+- **Decision**: Apply distribution losses to residuals (not raw x0).
+- **Why**: Diffusion models residuals; alignment should match residual distribution.
+- **Files**:
+  - `example/train.py` (quantile loss on residuals)
+  - `example/config.json` (quantile weight)
+
+## 2026-01-26 — SNR-weighted loss + residual stats
+- **Decision**: Add SNR-weighted loss and residual mean/std regularization.
+- **Why**: Stabilize diffusion training and improve KS.
+- **Files**:
+  - `example/train.py`
+  - `example/config.json`
+
+## 2026-01-26 — Switchable backbone (GRU vs Transformer)
+- **Decision**: Make the diffusion backbone configurable (`backbone_type`) with a Transformer encoder option.
+- **Why**: Test whether self‑attention reduces temporal vs distribution competition without altering the two‑stage design.
+- **Files**:
+  - `example/hybrid_diffusion.py`
+  - `example/train.py`
+  - `example/sample.py`
+  - `example/export_samples.py`
+  - `example/config.json`