8.3 KiB
mask-ddpm Project Report (Detailed)
This report is a complete, beginner‑friendly description of the current project implementation as of the latest code in this repo. It explains what the project does, how data flows, what each file is for, and why the architecture is designed this way.
0. TL;DR / 一句话概览
We generate multivariate ICS time‑series by (1) learning temporal trend with GRU and (2) learning residuals with a hybrid diffusion model (continuous DDPM + discrete masked diffusion). We then evaluate with tie‑aware KS and run Type‑aware postprocessing for diagnostic KS reduction.
1. Project Goal / 项目目标
We want synthetic ICS sequences that are:
- Distribution‑aligned (per‑feature CDF matches real data → low KS)
- Temporally consistent (lag‑1 correlation and trend are realistic)
- Discrete‑valid (state tokens are legal and frequency‑consistent)
This is hard because distribution and temporal structure often conflict in a single model.
2. Data & Feature Schema / 数据与特征结构
Input data: HAI CSV files (compressed) in dataset/hai/hai-21.03/.
Feature split: example/feature_split.json
continuous: real‑valued sensors/actuatorsdiscrete: state tokens / modestime_column: time index (not trained)
3. Preprocessing / 预处理
File: example/prepare_data.py
Continuous features
- Mean/std statistics
- Quantile table (if
use_quantile_transform=true) - Optional transforms (log1p etc.)
- Output:
example/results/cont_stats.json
Discrete features
- Token vocab from data
- Output:
example/results/disc_vocab.json
File: example/data_utils.py contains
- Normalization / inverse
- Quantile transform / inverse
- Post‑calibration helpers
4. Architecture / 模型结构
4.1 Stage‑1 Temporal GRU (Trend)
File: example/hybrid_diffusion.py
- Class:
TemporalGRUGenerator - Input: continuous sequence
- Output: trend sequence (teacher forced)
- Purpose: capture temporal structure
4.2 Stage‑2 Hybrid Diffusion (Residual)
File: example/hybrid_diffusion.py
Continuous branch
- Gaussian DDPM
- Predicts residual (or noise)
Discrete branch
- Mask diffusion (masked tokens)
- Classifier head per discrete column
Backbone
- Current config uses Transformer encoder (
backbone_type=transformer) - GRU is still supported as option
Conditioning
- File‑id conditioning (
use_condition=true,condition_type=file_id) - Type‑1 (setpoint/demand) can be passed as continuous condition (
cond_cont)
5. Training Flow / 训练流程
File: example/train.py
5.1 Stage‑1 Temporal training
- Use continuous features (excluding Type1/Type5)
- Teacher‑forced GRU predicts next step
- Loss: MSE
- Output:
temporal.pt
5.2 Stage‑2 Diffusion training
- Compute residual:
x_resid = x_cont - trend - Sample time step
t - Add noise for continuous; mask tokens for discrete
- Model predicts:
- eps_pred for continuous residual
- logits for discrete tokens
Loss design
- Continuous loss: MSE on eps or x0 (
cont_target) - Optional weighting: inverse variance (
cont_loss_weighting=inv_std) - Optional SNR weighting (
snr_weighted_loss) - Optional quantile loss (align residual distribution)
- Optional residual mean/std loss
- Discrete loss: cross‑entropy on masked tokens
- Total:
loss = λ * loss_cont + (1‑λ) * loss_disc
6. Sampling & Export / 采样与导出
File: example/export_samples.py
Steps:
- Initialize continuous with noise
- Initialize discrete with masks
- Reverse diffusion loop from
t=T..0 - Add trend back (if temporal stage enabled)
- Inverse transforms (quantile → raw)
- Clip/bound if configured
- Merge back Type1 (conditioning) and Type5 (derived)
- Write
generated.csv
7. Evaluation / 评估
File: example/evaluate_generated.py
Metrics
- KS (tie‑aware) for continuous
- JSD for discrete
- lag‑1 correlation for temporal consistency
- quantile diffs, mean/std errors
Important
- Reference supports glob and aggregates all matched files
- KS implementation is tie‑aware (correct for spiky/quantized data)
Outputs:
example/results/eval.json
8. Diagnostics / 诊断工具
example/diagnose_ks.py: CDF plots and per‑feature KSexample/ranked_ks.py: ranked KS + contributionexample/filtered_metrics.py: filtered KS excluding outliersexample/program_stats.py: Type‑1 statsexample/controller_stats.py: Type‑2 statsexample/actuator_stats.py: Type‑3 statsexample/pv_stats.py: Type‑4 statsexample/aux_stats.py: Type‑6 stats
9. Type‑Aware Modeling / 类型化分离
To reduce KS dominated by a few variables, the project uses Type categories defined in config:
- Type1: setpoints / demand (schedule‑driven)
- Type2: controller outputs
- Type3: actuator positions
- Type4: PV sensors
- Type5: derived tags
- Type6: auxiliary / coupling
Current implementation (diagnostic KS baseline)
File: example/postprocess_types.py
- Type1/2/3/5/6 → empirical resampling from real distribution
- Type4 → keep diffusion output
This is not the final model, but provides a KS‑upper bound for diagnosis.
Outputs:
example/results/generated_post.csvexample/results/eval_post.json
10. Pipeline / 一键流程
File: example/run_all.py
Default pipeline:
- prepare_data
- train
- export_samples
- evaluate_generated (generated.csv)
- postprocess_types (generated_post.csv)
- evaluate_generated (eval_post.json)
- diagnostics scripts
Linux:
python example/run_all.py --device cuda --config example/config.json
Windows (PowerShell):
python run_all.py --device cuda --config config.json
11. Current Configuration (Key Defaults)
From example/config.json:
- backbone_type: transformer
- timesteps: 600
- seq_len: 96
- batch_size: 16
- cont_target:
x0 - cont_loss_weighting:
inv_std - snr_weighted_loss: true
- quantile_loss_weight: 0.2
- use_quantile_transform: true
- cont_post_calibrate: true
- use_temporal_stage1: true
12. What’s Actually Trained vs What’s Post‑Processed
Trained
- Temporal GRU (trend)
- Diffusion residual model (continuous + discrete)
Post‑Processed (KS‑only)
- Type1/2/3/5/6 replaced by empirical resampling
This is important: postprocess improves KS but may break joint realism.
13. Why It’s Still Hard / 当前难点
- Type1/2/3 are event‑driven and piecewise constant
- Diffusion (Gaussian DDPM + MSE) tends to smooth/blur these
- Temporal vs distribution objectives pull in opposite directions
14. Where To Improve Next / 下一步方向
-
Replace KS‑only postprocess with conditional generators:
- Type1: program generator (HMM / schedule)
- Type2: controller emulator (PID‑like)
- Type3: actuator dynamics (dwell + rate + saturation)
-
Add regime conditioning for Type4 PVs
-
Joint realism checks (cross‑feature correlation)
15. Key Files (Complete but Pruned)
mask-ddpm/
report.md
docs/
README.md
architecture.md
evaluation.md
decisions.md
experiments.md
ideas.md
example/
config.json
config_no_temporal.json
config_temporal_strong.json
feature_split.json
data_utils.py
prepare_data.py
hybrid_diffusion.py
train.py
sample.py
export_samples.py
evaluate_generated.py
run_all.py
run_compare.py
diagnose_ks.py
filtered_metrics.py
ranked_ks.py
program_stats.py
controller_stats.py
actuator_stats.py
pv_stats.py
aux_stats.py
postprocess_types.py
results/
generated.csv
generated_post.csv
eval.json
eval_post.json
cont_stats.json
disc_vocab.json
metrics_history.csv
16. Summary / 总结
The current project is a hybrid diffusion system with a two‑stage temporal+residual design, built to balance distribution alignment and temporal realism. The architecture is modular, with explicit type‑aware diagnostics and postprocessing, and supports both GRU and Transformer backbones. The remaining research challenge is to replace KS‑only postprocessing with conditional, structurally consistent generators for Type1/2/3/5/6 features.