# Ideas & Hypotheses ## Transformer as backbone (Plan B) - Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching. - Risk: higher compute cost, potentially more unstable training. - Status: implemented as `backbone_type = "transformer"` in config. - Experiment: compare GRU vs Transformer using `run_compare.py`. ## Residual standardization - Hypothesis: standardizing residuals before diffusion reduces drift and improves KS. ## Two-stage training with curriculum - Hypothesis: train diffusion on residuals only after temporal GRU converges to low error. ## Discrete calibration - Hypothesis: post-hoc calibration on discrete marginals can reduce JSD without harming KS. ## Feature-type split modeling - Hypothesis: separate generation per feature type (setpoints, controllers, actuators, quantized, derived, aux) yields better overall fidelity.