Update example and notes

2026-01-09 02:14:20 +08:00
parent 200bdf6136
commit c0639386be
18 changed files with 31656 additions and 0 deletions
--- a/example/model_design.md
+++ b/example/model_design.md
@@ -0,0 +1,45 @@
+# Hybrid Diffusion Design (HAI 21.03)
+
+## 1) Data representation
+- Input sequence length: T (e.g., 64 or 128 time steps).
+- Continuous features: 53 columns (sensor/process values).
+- Discrete features: 30 columns (binary or low-cardinality states + attack labels).
+- Time column: `time` is excluded from modeling; use index-based position/time embeddings.
+
+## 2) Forward processes
+### Continuous (Gaussian DDPM)
+- Use cosine beta schedule with `timesteps=1000`.
+- Forward: `x_t = sqrt(a_bar_t) * x_0 + sqrt(1-a_bar_t) * eps`.
+
+### Discrete (mask diffusion)
+- Use `[MASK]` replacement with probability `p(t)`.
+- Simple schedule: `p(t) = t / T`.
+- Model predicts original token at masked positions only.
+
+## 3) Shared backbone + heads
+- Inputs: concatenated continuous projection + discrete embeddings + time embedding.
+- Backbone: GRU or temporal transformer.
+- Heads:
+  - Continuous head predicts noise `eps`.
+  - Discrete heads predict logits per discrete feature.
+
+## 4) Loss
+- Continuous: `L_cont = MSE(eps_pred, eps)`.
+- Discrete: `L_disc = CE(logits, target)` on masked positions only.
+- Combined: `L = lambda * L_cont + (1 - lambda) * L_disc`.
+
+## 5) Training loop (high level)
+1. Load a batch of sequences.
+2. Sample timesteps `t`.
+3. Apply `q_sample_continuous` and `q_sample_discrete`.
+4. Forward model, compute losses.
+5. Backprop + optimizer step.
+
+## 6) Sampling (high level)
+- Continuous: standard reverse diffusion from pure noise.
+- Discrete: start from all `[MASK]` and iteratively refine tokens.
+
+## 7) Files in this example
+- `feature_split.json`: column split for HAI 21.03.
+- `hybrid_diffusion.py`: model + diffusion utilities.
+- `train_stub.py`: end-to-end scaffold for loss computation.