transformer

2026-01-27 00:41:42 +08:00
parent 65391910a2
commit 334db7082b
12 changed files with 175 additions and 11 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,12 @@
+# Documentation Index
+
+This folder tracks project decisions, experiments, and evolving ideas.
+
+- `decisions.md`: design/architecture changes and rationales
+- `experiments.md`: experiment runs and results
+- `ideas.md`: future ideas and hypotheses
+
+Conventions:
+- Append new entries instead of overwriting old ones.
+- Record exact config file and key overrides when possible.
+- Keep metrics in the order: avg_ks / avg_jsd / avg_lag1_diff.
--- a/docs/decisions.md
+++ b/docs/decisions.md
@@ -0,0 +1,35 @@
+# Design & Decision Log
+
+## 2026-01-26 — Two-stage temporal backbone (GRU) + residual diffusion
+- **Decision**: Add a stage-1 GRU trend model, then train diffusion on residuals.
+- **Why**: Separate temporal consistency from distribution alignment.
+- **Files**:
+  - `example/hybrid_diffusion.py` (added `TemporalGRUGenerator`)
+  - `example/train.py` (two-stage training + residual diffusion)
+  - `example/sample.py`, `example/export_samples.py` (trend + residual synthesis)
+  - `example/config.json` (temporal hyperparameters)
+- **Expected effect**: improve lag-1 consistency; may hurt KS if residual distribution drifts.
+
+## 2026-01-26 — Residual distribution alignment losses
+- **Decision**: Apply distribution losses to residuals (not raw x0).
+- **Why**: Diffusion models residuals; alignment should match residual distribution.
+- **Files**:
+  - `example/train.py` (quantile loss on residuals)
+  - `example/config.json` (quantile weight)
+
+## 2026-01-26 — SNR-weighted loss + residual stats
+- **Decision**: Add SNR-weighted loss and residual mean/std regularization.
+- **Why**: Stabilize diffusion training and improve KS.
+- **Files**:
+  - `example/train.py`
+  - `example/config.json`
+
+## 2026-01-26 — Switchable backbone (GRU vs Transformer)
+- **Decision**: Make the diffusion backbone configurable (`backbone_type`) with a Transformer encoder option.
+- **Why**: Test whether self‑attention reduces temporal vs distribution competition without altering the two‑stage design.
+- **Files**:
+  - `example/hybrid_diffusion.py`
+  - `example/train.py`
+  - `example/sample.py`
+  - `example/export_samples.py`
+  - `example/config.json`
--- a/docs/experiments.md
+++ b/docs/experiments.md
@@ -0,0 +1,29 @@
+# Experiment Log
+
+## Format
+```
+YYYY-MM-DD
+- Config: <config file or key overrides>
+- Result: avg_ks / avg_jsd / avg_lag1_diff
+- Notes
+```
+
+## 2026-01-26
+- Config: `example/config_no_temporal.json` (baseline)
+- Result: 0.6474156 / 0.0576699 / 0.1981700
+- Notes: no temporal stage; better KS, worse lag-1.
+
+## 2026-01-26
+- Config: `example/config_temporal_strong.json` (two-stage)
+- Result: 0.6892453 / 0.0564408 / 0.1568776
+- Notes: lag-1 improves, KS degrades; residual drift remains.
+
+## 2026-01-26
+- Config: `example/config.json` (two-stage residual diffusion; user run on Windows)
+- Result: 0.7131993 / 0.0327603 / 0.2327633
+- Notes: user-reported metrics after temporal stage + residual diffusion.
+
+## 2026-01-26
+- Config: `example/config.json` (two-stage residual diffusion; user run on Windows)
+- Result: 0.7096230 / 0.0331810 / 0.1898416
+- Notes: slight KS improvement, lag-1 improves; still distribution/temporal trade-off.
--- a/docs/ideas.md
+++ b/docs/ideas.md
@@ -0,0 +1,13 @@
+# Ideas & Hypotheses
+
+## Transformer as backbone (Plan B)
+- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
+- Risk: higher compute cost, potentially more unstable training.
+- Status: implemented as `backbone_type = "transformer"` in config.
+- Experiment: compare GRU vs Transformer using `run_compare.py`.
+
+## Residual standardization
+- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
+
+## Two-stage training with curriculum
+- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.