Files

MingzheYang 8db286792e Rewrite report with full project documentation

2026-01-28 22:24:50 +08:00

8.3 KiB

Raw Blame History

mask-ddpm Project Report (Detailed)

This report is a complete, beginner‑friendly description of the current project implementation as of the latest code in this repo. It explains what the project does, how data flows, what each file is for, and why the architecture is designed this way.

0. TL;DR / 一句话概览

We generate multivariate ICS time‑series by (1) learning temporal trend with GRU and (2) learning residuals with a hybrid diffusion model (continuous DDPM + discrete masked diffusion). We then evaluate with tie‑aware KS and run Type‑aware postprocessing for diagnostic KS reduction.

1. Project Goal / 项目目标

We want synthetic ICS sequences that are:

Distribution‑aligned (per‑feature CDF matches real data → low KS)
Temporally consistent (lag‑1 correlation and trend are realistic)
Discrete‑valid (state tokens are legal and frequency‑consistent)

This is hard because distribution and temporal structure often conflict in a single model.

2. Data & Feature Schema / 数据与特征结构

Input data: HAI CSV files (compressed) in dataset/hai/hai-21.03/.

Feature split: example/feature_split.json

continuous: real‑valued sensors/actuators
discrete: state tokens / modes
time_column: time index (not trained)

3. Preprocessing / 预处理

File: example/prepare_data.py

Continuous features

Mean/std statistics
Quantile table (if use_quantile_transform=true)
Optional transforms (log1p etc.)
Output: example/results/cont_stats.json

Discrete features

Token vocab from data
Output: example/results/disc_vocab.json

File: example/data_utils.py contains

Normalization / inverse
Quantile transform / inverse
Post‑calibration helpers

4. Architecture / 模型结构

4.1 Stage‑1 Temporal GRU (Trend)

File: example/hybrid_diffusion.py

Class: TemporalGRUGenerator
Input: continuous sequence
Output: trend sequence (teacher forced)
Purpose: capture temporal structure

4.2 Stage‑2 Hybrid Diffusion (Residual)

File: example/hybrid_diffusion.py

Continuous branch

Gaussian DDPM
Predicts residual (or noise)

Discrete branch

Mask diffusion (masked tokens)
Classifier head per discrete column

Backbone

Current config uses Transformer encoder (backbone_type=transformer)
GRU is still supported as option

Conditioning

File‑id conditioning (use_condition=true, condition_type=file_id)
Type‑1 (setpoint/demand) can be passed as continuous condition (cond_cont)

5. Training Flow / 训练流程

File: example/train.py

5.1 Stage‑1 Temporal training

Use continuous features (excluding Type1/Type5)
Teacher‑forced GRU predicts next step
Loss: MSE
Output: temporal.pt

5.2 Stage‑2 Diffusion training

Compute residual: x_resid = x_cont - trend
Sample time step t
Add noise for continuous; mask tokens for discrete
Model predicts:
- eps_pred for continuous residual
- logits for discrete tokens

Loss design

Continuous loss: MSE on eps or x0 (cont_target)
Optional weighting: inverse variance (cont_loss_weighting=inv_std)
Optional SNR weighting (snr_weighted_loss)
Optional quantile loss (align residual distribution)
Optional residual mean/std loss
Discrete loss: cross‑entropy on masked tokens
Total: loss = λ * loss_cont + (1‑λ) * loss_disc

6. Sampling & Export / 采样与导出

File: example/export_samples.py

Steps:

Initialize continuous with noise
Initialize discrete with masks
Reverse diffusion loop from t=T..0
Add trend back (if temporal stage enabled)
Inverse transforms (quantile → raw)
Clip/bound if configured
Merge back Type1 (conditioning) and Type5 (derived)
Write generated.csv

7. Evaluation / 评估

File: example/evaluate_generated.py

Metrics

KS (tie‑aware) for continuous
JSD for discrete
lag‑1 correlation for temporal consistency
quantile diffs, mean/std errors

Important

Reference supports glob and aggregates all matched files
KS implementation is tie‑aware (correct for spiky/quantized data)

Outputs:

example/results/eval.json

8. Diagnostics / 诊断工具

example/diagnose_ks.py: CDF plots and per‑feature KS
example/ranked_ks.py: ranked KS + contribution
example/filtered_metrics.py: filtered KS excluding outliers
example/program_stats.py: Type‑1 stats
example/controller_stats.py: Type‑2 stats
example/actuator_stats.py: Type‑3 stats
example/pv_stats.py: Type‑4 stats
example/aux_stats.py: Type‑6 stats

9. Type‑Aware Modeling / 类型化分离

To reduce KS dominated by a few variables, the project uses Type categories defined in config:

Type1: setpoints / demand (schedule‑driven)
Type2: controller outputs
Type3: actuator positions
Type4: PV sensors
Type5: derived tags
Type6: auxiliary / coupling

Current implementation (diagnostic KS baseline)

File: example/postprocess_types.py

Type1/2/3/5/6 → empirical resampling from real distribution
Type4 → keep diffusion output

This is not the final model, but provides a KS‑upper bound for diagnosis.

Outputs:

example/results/generated_post.csv
example/results/eval_post.json

10. Pipeline / 一键流程

File: example/run_all.py

Default pipeline:

prepare_data
train
export_samples
evaluate_generated (generated.csv)
postprocess_types (generated_post.csv)
evaluate_generated (eval_post.json)
diagnostics scripts

Linux:

python example/run_all.py --device cuda --config example/config.json

Windows (PowerShell):

python run_all.py --device cuda --config config.json

11. Current Configuration (Key Defaults)

From example/config.json:

backbone_type: transformer
timesteps: 600
seq_len: 96
batch_size: 16
cont_target: x0
cont_loss_weighting: inv_std
snr_weighted_loss: true
quantile_loss_weight: 0.2
use_quantile_transform: true
cont_post_calibrate: true
use_temporal_stage1: true

12. What’s Actually Trained vs What’s Post‑Processed

Trained

Temporal GRU (trend)
Diffusion residual model (continuous + discrete)

Post‑Processed (KS‑only)

Type1/2/3/5/6 replaced by empirical resampling

This is important: postprocess improves KS but may break joint realism.

13. Why It’s Still Hard / 当前难点

Type1/2/3 are event‑driven and piecewise constant
Diffusion (Gaussian DDPM + MSE) tends to smooth/blur these
Temporal vs distribution objectives pull in opposite directions

14. Where To Improve Next / 下一步方向

Replace KS‑only postprocess with conditional generators:
- Type1: program generator (HMM / schedule)
- Type2: controller emulator (PID‑like)
- Type3: actuator dynamics (dwell + rate + saturation)
Add regime conditioning for Type4 PVs
Joint realism checks (cross‑feature correlation)

15. Key Files (Complete but Pruned)

mask-ddpm/
  report.md
  docs/
    README.md
    architecture.md
    evaluation.md
    decisions.md
    experiments.md
    ideas.md
  example/
    config.json
    config_no_temporal.json
    config_temporal_strong.json
    feature_split.json
    data_utils.py
    prepare_data.py
    hybrid_diffusion.py
    train.py
    sample.py
    export_samples.py
    evaluate_generated.py
    run_all.py
    run_compare.py
    diagnose_ks.py
    filtered_metrics.py
    ranked_ks.py
    program_stats.py
    controller_stats.py
    actuator_stats.py
    pv_stats.py
    aux_stats.py
    postprocess_types.py
    results/
      generated.csv
      generated_post.csv
      eval.json
      eval_post.json
      cont_stats.json
      disc_vocab.json
      metrics_history.csv

16. Summary / 总结

The current project is a hybrid diffusion system with a two‑stage temporal+residual design, built to balance distribution alignment and temporal realism. The architecture is modular, with explicit type‑aware diagnostics and postprocessing, and supports both GRU and Transformer backbones. The remaining research challenge is to replace KS‑only postprocessing with conditional, structurally consistent generators for Type1/2/3/5/6 features.

8.3 KiB Raw Blame History Unescape Escape