Files
mask-ddpm/docs/ideas.md

766 B

Ideas & Hypotheses

Transformer as backbone (Plan B)

  • Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
  • Risk: higher compute cost, potentially more unstable training.
  • Status: implemented as backbone_type = "transformer" in config.
  • Experiment: compare GRU vs Transformer using run_compare.py.

Residual standardization

  • Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.

Two-stage training with curriculum

  • Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.

Discrete calibration

  • Hypothesis: post-hoc calibration on discrete marginals can reduce JSD without harming KS.