Files
mask-ddpm/example/model_design.md
2026-01-09 02:14:20 +08:00

1.7 KiB

Hybrid Diffusion Design (HAI 21.03)

1) Data representation

  • Input sequence length: T (e.g., 64 or 128 time steps).
  • Continuous features: 53 columns (sensor/process values).
  • Discrete features: 30 columns (binary or low-cardinality states + attack labels).
  • Time column: time is excluded from modeling; use index-based position/time embeddings.

2) Forward processes

Continuous (Gaussian DDPM)

  • Use cosine beta schedule with timesteps=1000.
  • Forward: x_t = sqrt(a_bar_t) * x_0 + sqrt(1-a_bar_t) * eps.

Discrete (mask diffusion)

  • Use [MASK] replacement with probability p(t).
  • Simple schedule: p(t) = t / T.
  • Model predicts original token at masked positions only.

3) Shared backbone + heads

  • Inputs: concatenated continuous projection + discrete embeddings + time embedding.
  • Backbone: GRU or temporal transformer.
  • Heads:
    • Continuous head predicts noise eps.
    • Discrete heads predict logits per discrete feature.

4) Loss

  • Continuous: L_cont = MSE(eps_pred, eps).
  • Discrete: L_disc = CE(logits, target) on masked positions only.
  • Combined: L = lambda * L_cont + (1 - lambda) * L_disc.

5) Training loop (high level)

  1. Load a batch of sequences.
  2. Sample timesteps t.
  3. Apply q_sample_continuous and q_sample_discrete.
  4. Forward model, compute losses.
  5. Backprop + optimizer step.

6) Sampling (high level)

  • Continuous: standard reverse diffusion from pure noise.
  • Discrete: start from all [MASK] and iteratively refine tokens.

7) Files in this example

  • feature_split.json: column split for HAI 21.03.
  • hybrid_diffusion.py: model + diffusion utilities.
  • train_stub.py: end-to-end scaffold for loss computation.