ModuFlow/mask-ddpm

Files

MingzheYang 34d6f0d808 Update docs with latest architecture and results

2026-01-28 11:23:50 +08:00

766 B

Raw Blame History

Ideas & Hypotheses

Transformer as backbone (Plan B)

Hypothesis: self-attention may better capture long-range dependencies and reduce conflict between temporal consistency and distribution matching.
Risk: higher compute cost, potentially more unstable training.
Status: implemented as backbone_type = "transformer" in config.
Experiment: compare GRU vs Transformer using run_compare.py.

Residual standardization

Hypothesis: standardizing residuals before diffusion reduces drift and improves KS.

Two-stage training with curriculum

Hypothesis: train diffusion on residuals only after temporal GRU converges to low error.

Discrete calibration

Hypothesis: post-hoc calibration on discrete marginals can reduce JSD without harming KS.