mask-ddpm/example/model_design.md

# Hybrid Diffusion Design (HAI 21.03)

## 1) Data representation
- Input sequence length: T (e.g., 64 or 128 time steps).
- Continuous features: 53 columns (sensor/process values).
- Discrete features: 30 columns (binary or low-cardinality states + attack labels).
- Time column: `time` is excluded from modeling; use index-based position/time embeddings.

## 2) Forward processes
### Continuous (Gaussian DDPM)
- Use cosine beta schedule with `timesteps=1000`.
- Forward: `x_t = sqrt(a_bar_t) * x_0 + sqrt(1-a_bar_t) * eps`.

### Discrete (mask diffusion)
- Use `[MASK]` replacement with probability `p(t)`.
- Simple schedule: `p(t) = t / T`.
- Model predicts original token at masked positions only.

## 3) Shared backbone + heads
- Inputs: concatenated continuous projection + discrete embeddings + time embedding.
- Backbone: GRU or temporal transformer.
- Heads:
  - Continuous head predicts noise `eps`.
  - Discrete heads predict logits per discrete feature.

## 4) Loss
- Continuous: `L_cont = MSE(eps_pred, eps)`.
- Discrete: `L_disc = CE(logits, target)` on masked positions only.
- Combined: `L = lambda * L_cont + (1 - lambda) * L_disc`.

## 5) Training loop (high level)
1. Load a batch of sequences.
2. Sample timesteps `t`.
3. Apply `q_sample_continuous` and `q_sample_discrete`.
4. Forward model, compute losses.
5. Backprop + optimizer step.

## 6) Sampling (high level)
- Continuous: standard reverse diffusion from pure noise.
- Discrete: start from all `[MASK]` and iteratively refine tokens.

## 7) Files in this example
- `feature_split.json`: column split for HAI 21.03.
- `hybrid_diffusion.py`: model + diffusion utilities.
- `train_stub.py`: end-to-end scaffold for loss computation.