100 lines
4.4 KiB
Markdown
100 lines
4.4 KiB
Markdown
# Design & Decision Log
|
||
|
||
## 2026-01-26 — Two-stage temporal backbone (GRU) + residual diffusion
|
||
- **Decision**: Add a stage-1 GRU trend model, then train diffusion on residuals.
|
||
- **Why**: Separate temporal consistency from distribution alignment.
|
||
- **Files**:
|
||
- `example/hybrid_diffusion.py` (added `TemporalGRUGenerator`)
|
||
- `example/train.py` (two-stage training + residual diffusion)
|
||
- `example/sample.py`, `example/export_samples.py` (trend + residual synthesis)
|
||
- `example/config.json` (temporal hyperparameters)
|
||
- **Expected effect**: improve lag-1 consistency; may hurt KS if residual distribution drifts.
|
||
|
||
## 2026-01-26 — Residual distribution alignment losses
|
||
- **Decision**: Apply distribution losses to residuals (not raw x0).
|
||
- **Why**: Diffusion models residuals; alignment should match residual distribution.
|
||
- **Files**:
|
||
- `example/train.py` (quantile loss on residuals)
|
||
- `example/config.json` (quantile weight)
|
||
|
||
## 2026-01-26 — SNR-weighted loss + residual stats
|
||
- **Decision**: Add SNR-weighted loss and residual mean/std regularization.
|
||
- **Why**: Stabilize diffusion training and improve KS.
|
||
- **Files**:
|
||
- `example/train.py`
|
||
- `example/config.json`
|
||
|
||
## 2026-01-26 — Switchable backbone (GRU vs Transformer)
|
||
- **Decision**: Make the diffusion backbone configurable (`backbone_type`) with a Transformer encoder option.
|
||
- **Why**: Test whether self‑attention reduces temporal vs distribution competition without altering the two‑stage design.
|
||
- **Files**:
|
||
- `example/hybrid_diffusion.py`
|
||
- `example/train.py`
|
||
- `example/sample.py`
|
||
- `example/export_samples.py`
|
||
- `example/config.json`
|
||
|
||
## 2026-01-26 — Per-feature KS diagnostics
|
||
- **Decision**: Add a per-feature KS/CDF diagnostic script to pinpoint KS failures (tails, boundary pile-up, shifts).
|
||
- **Why**: Avoid blind reweighting and find the specific features causing KS to stay high.
|
||
- **Files**:
|
||
- `example/diagnose_ks.py`
|
||
|
||
## 2026-01-26 — Quantile transform + sigmoid bounds for continuous features
|
||
- **Decision**: Add optional quantile normalization (TabDDPM-style) and sigmoid-based bounds to reduce KS spikes.
|
||
- **Why**: KS failures are dominated by boundary pile-up and tail mismatch.
|
||
- **Files**:
|
||
- `example/data_utils.py`
|
||
- `example/prepare_data.py`
|
||
- `example/export_samples.py`
|
||
- `example/config.json`
|
||
|
||
## 2026-01-27 — Quantile transform without extra standardization
|
||
- **Decision**: When quantile transform is enabled, skip mean/std normalization (quantile output already ~N(0,1)).
|
||
- **Why**: Prevent scale mismatch that pushed values to max bounds and blew up KS.
|
||
- **Files**:
|
||
- `example/data_utils.py`
|
||
- `example/export_samples.py`
|
||
|
||
## 2026-01-27 — Soft bounds + post-scale for boundary pile-up
|
||
- **Decision**: Replace hard sigmoid bounds with soft tanh bounds and allow per-feature post-scaling.
|
||
- **Why**: Many continuous features collapsed to max bound (KS=1.0).
|
||
- **Files**:
|
||
- `example/export_samples.py`
|
||
- `example/config.json`
|
||
|
||
## 2026-01-27 — Post-hoc quantile calibration
|
||
- **Decision**: Add optional post-hoc quantile calibration to align generated 1D CDFs with real data.
|
||
- **Why**: KS remained high with distribution shifts even after boundary fixes.
|
||
- **Files**:
|
||
- `example/data_utils.py`
|
||
- `example/export_samples.py`
|
||
- `example/prepare_data.py`
|
||
- `example/config.json`
|
||
|
||
## 2026-01-27 — Full quantile stats in preparation
|
||
- **Decision**: Enable full statistics when quantile transform is active.
|
||
- **Why**: Stabilize quantile tables and reduce CDF mismatch.
|
||
- **Files**:
|
||
- `example/prepare_data.py`
|
||
- `example/config.json`
|
||
|
||
## 2026-01-27 — Filtered KS for diagnostics
|
||
- **Decision**: Add a filtered KS metric that excludes collapsed/outlier features.
|
||
- **Why**: Avoid a handful of features dominating the aggregate KS while still reporting full KS.
|
||
- **Files**:
|
||
- `example/filtered_metrics.py`
|
||
- `example/run_all_full.py`
|
||
|
||
## 2026-01-28 — Tie-aware KS + full-reference aggregation
|
||
- **Decision**: Fix KS computation to handle ties correctly and aggregate all reference files matched by glob.
|
||
- **Why**: Spiky/quantized features were overstating KS; single-file reference was misleading.
|
||
- **Files**:
|
||
- `example/evaluate_generated.py`
|
||
|
||
## 2026-01-28 — KS-only postprocess baseline
|
||
- **Decision**: Add an empirical resampling mode for Type1/2/3/5/6 to aggressively reduce KS.
|
||
- **Why**: Provide a diagnostic upper-bound on KS without retraining.
|
||
- **Files**:
|
||
- `example/postprocess_types.py`
|