Files
mask-ddpm/docs/ideas.md

942 B

Ideas & Hypotheses

Transformer as backbone (Plan B)

  • Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
  • Risk: higher compute cost, potentially more unstable training.
  • Status: implemented as backbone_type = "transformer" in config.
  • Experiment: compare GRU vs Transformer using run_compare.py.

Residual standardization

  • Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.

Two-stage training with curriculum

  • Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.

Discrete calibration

  • Hypothesis: post-hoc calibration on discrete marginals can reduce JSD without harming KS.

Feature-type split modeling

  • Hypothesis: separate generation per feature type (setpoints, controllers, actuators, quantized, derived, aux) yields better overall fidelity.