20 lines
942 B
Markdown
20 lines
942 B
Markdown
# Ideas & Hypotheses
|
|
|
|
## Transformer as backbone (Plan B)
|
|
- Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
|
|
- Risk: higher compute cost, potentially more unstable training.
|
|
- Status: implemented as `backbone_type = "transformer"` in config.
|
|
- Experiment: compare GRU vs Transformer using `run_compare.py`.
|
|
|
|
## Residual standardization
|
|
- Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.
|
|
|
|
## Two-stage training with curriculum
|
|
- Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.
|
|
|
|
## Discrete calibration
|
|
- Hypothesis: post-hoc calibration on discrete marginals can reduce JSD without harming KS.
|
|
|
|
## Feature-type split modeling
|
|
- Hypothesis: separate generation per feature type (setpoints, controllers, actuators, quantized, derived, aux) yields better overall fidelity.
|