# Hybrid Diffusion Design (HAI 21.03) ## 1) Data representation - Input sequence length: T (e.g., 64 or 128 time steps). - Continuous features: 53 columns (sensor/process values). - Discrete features: 30 columns (binary or low-cardinality states + attack labels). - Time column: `time` is excluded from modeling; use index-based position/time embeddings. ## 2) Forward processes ### Continuous (Gaussian DDPM) - Use cosine beta schedule with `timesteps=1000`. - Forward: `x_t = sqrt(a_bar_t) * x_0 + sqrt(1-a_bar_t) * eps`. ### Discrete (mask diffusion) - Use `[MASK]` replacement with probability `p(t)`. - Simple schedule: `p(t) = t / T`. - Model predicts original token at masked positions only. ## 3) Shared backbone + heads - Inputs: concatenated continuous projection + discrete embeddings + time embedding. - Backbone: GRU or temporal transformer. - Heads: - Continuous head predicts noise `eps`. - Discrete heads predict logits per discrete feature. ## 4) Loss - Continuous: `L_cont = MSE(eps_pred, eps)`. - Discrete: `L_disc = CE(logits, target)` on masked positions only. - Combined: `L = lambda * L_cont + (1 - lambda) * L_disc`. ## 5) Training loop (high level) 1. Load a batch of sequences. 2. Sample timesteps `t`. 3. Apply `q_sample_continuous` and `q_sample_discrete`. 4. Forward model, compute losses. 5. Backprop + optimizer step. ## 6) Sampling (high level) - Continuous: standard reverse diffusion from pure noise. - Discrete: start from all `[MASK]` and iteratively refine tokens. ## 7) Files in this example - `feature_split.json`: column split for HAI 21.03. - `hybrid_diffusion.py`: model + diffusion utilities. - `train_stub.py`: end-to-end scaffold for loss computation.