transformer
This commit is contained in:
12
docs/README.md
Normal file
12
docs/README.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# Documentation Index
|
||||
|
||||
This folder tracks project decisions, experiments, and evolving ideas.
|
||||
|
||||
- `decisions.md`: design/architecture changes and rationales
|
||||
- `experiments.md`: experiment runs and results
|
||||
- `ideas.md`: future ideas and hypotheses
|
||||
|
||||
Conventions:
|
||||
- Append new entries instead of overwriting old ones.
|
||||
- Record exact config file and key overrides when possible.
|
||||
- Keep metrics in the order: avg_ks / avg_jsd / avg_lag1_diff.
|
||||
35
docs/decisions.md
Normal file
35
docs/decisions.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Design & Decision Log
|
||||
|
||||
## 2026-01-26 — Two-stage temporal backbone (GRU) + residual diffusion
|
||||
- **Decision**: Add a stage-1 GRU trend model, then train diffusion on residuals.
|
||||
- **Why**: Separate temporal consistency from distribution alignment.
|
||||
- **Files**:
|
||||
- `example/hybrid_diffusion.py` (added `TemporalGRUGenerator`)
|
||||
- `example/train.py` (two-stage training + residual diffusion)
|
||||
- `example/sample.py`, `example/export_samples.py` (trend + residual synthesis)
|
||||
- `example/config.json` (temporal hyperparameters)
|
||||
- **Expected effect**: improve lag-1 consistency; may hurt KS if residual distribution drifts.
|
||||
|
||||
## 2026-01-26 — Residual distribution alignment losses
|
||||
- **Decision**: Apply distribution losses to residuals (not raw x0).
|
||||
- **Why**: Diffusion models residuals; alignment should match residual distribution.
|
||||
- **Files**:
|
||||
- `example/train.py` (quantile loss on residuals)
|
||||
- `example/config.json` (quantile weight)
|
||||
|
||||
## 2026-01-26 — SNR-weighted loss + residual stats
|
||||
- **Decision**: Add SNR-weighted loss and residual mean/std regularization.
|
||||
- **Why**: Stabilize diffusion training and improve KS.
|
||||
- **Files**:
|
||||
- `example/train.py`
|
||||
- `example/config.json`
|
||||
|
||||
## 2026-01-26 — Switchable backbone (GRU vs Transformer)
|
||||
- **Decision**: Make the diffusion backbone configurable (`backbone_type`) with a Transformer encoder option.
|
||||
- **Why**: Test whether self‑attention reduces temporal vs distribution competition without altering the two‑stage design.
|
||||
- **Files**:
|
||||
- `example/hybrid_diffusion.py`
|
||||
- `example/train.py`
|
||||
- `example/sample.py`
|
||||
- `example/export_samples.py`
|
||||
- `example/config.json`
|
||||
29
docs/experiments.md
Normal file
29
docs/experiments.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Experiment Log
|
||||
|
||||
## Format
|
||||
```
|
||||
YYYY-MM-DD
|
||||
- Config: <config file or key overrides>
|
||||
- Result: avg_ks / avg_jsd / avg_lag1_diff
|
||||
- Notes
|
||||
```
|
||||
|
||||
## 2026-01-26
|
||||
- Config: `example/config_no_temporal.json` (baseline)
|
||||
- Result: 0.6474156 / 0.0576699 / 0.1981700
|
||||
- Notes: no temporal stage; better KS, worse lag-1.
|
||||
|
||||
## 2026-01-26
|
||||
- Config: `example/config_temporal_strong.json` (two-stage)
|
||||
- Result: 0.6892453 / 0.0564408 / 0.1568776
|
||||
- Notes: lag-1 improves, KS degrades; residual drift remains.
|
||||
|
||||
## 2026-01-26
|
||||
- Config: `example/config.json` (two-stage residual diffusion; user run on Windows)
|
||||
- Result: 0.7131993 / 0.0327603 / 0.2327633
|
||||
- Notes: user-reported metrics after temporal stage + residual diffusion.
|
||||
|
||||
## 2026-01-26
|
||||
- Config: `example/config.json` (two-stage residual diffusion; user run on Windows)
|
||||
- Result: 0.7096230 / 0.0331810 / 0.1898416
|
||||
- Notes: slight KS improvement, lag-1 improves; still distribution/temporal trade-off.
|
||||
13
docs/ideas.md
Normal file
13
docs/ideas.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Ideas & Hypotheses
|
||||
|
||||
## Transformer as backbone (Plan B)
|
||||
- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
|
||||
- Risk: higher compute cost, potentially more unstable training.
|
||||
- Status: implemented as `backbone_type = "transformer"` in config.
|
||||
- Experiment: compare GRU vs Transformer using `run_compare.py`.
|
||||
|
||||
## Residual standardization
|
||||
- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
|
||||
|
||||
## Two-stage training with curriculum
|
||||
- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.
|
||||
Reference in New Issue
Block a user