Files
mask-ddpm/report.md

8.3 KiB
Raw Blame History

mask-ddpm Project Report (Detailed)

This report is a complete, beginnerfriendly description of the current project implementation as of the latest code in this repo. It explains what the project does, how data flows, what each file is for, and why the architecture is designed this way.


0. TL;DR / 一句话概览

We generate multivariate ICS timeseries by (1) learning temporal trend with GRU and (2) learning residuals with a hybrid diffusion model (continuous DDPM + discrete masked diffusion). We then evaluate with tieaware KS and run Typeaware postprocessing for diagnostic KS reduction.


1. Project Goal / 项目目标

We want synthetic ICS sequences that are:

  1. Distributionaligned (perfeature CDF matches real data → low KS)
  2. Temporally consistent (lag1 correlation and trend are realistic)
  3. Discretevalid (state tokens are legal and frequencyconsistent)

This is hard because distribution and temporal structure often conflict in a single model.


2. Data & Feature Schema / 数据与特征结构

Input data: HAI CSV files (compressed) in dataset/hai/hai-21.03/.

Feature split: example/feature_split.json

  • continuous: realvalued sensors/actuators
  • discrete: state tokens / modes
  • time_column: time index (not trained)

3. Preprocessing / 预处理

File: example/prepare_data.py

Continuous features

  • Mean/std statistics
  • Quantile table (if use_quantile_transform=true)
  • Optional transforms (log1p etc.)
  • Output: example/results/cont_stats.json

Discrete features

  • Token vocab from data
  • Output: example/results/disc_vocab.json

File: example/data_utils.py contains

  • Normalization / inverse
  • Quantile transform / inverse
  • Postcalibration helpers

4. Architecture / 模型结构

4.1 Stage1 Temporal GRU (Trend)

File: example/hybrid_diffusion.py

  • Class: TemporalGRUGenerator
  • Input: continuous sequence
  • Output: trend sequence (teacher forced)
  • Purpose: capture temporal structure

4.2 Stage2 Hybrid Diffusion (Residual)

File: example/hybrid_diffusion.py

Continuous branch

  • Gaussian DDPM
  • Predicts residual (or noise)

Discrete branch

  • Mask diffusion (masked tokens)
  • Classifier head per discrete column

Backbone

  • Current config uses Transformer encoder (backbone_type=transformer)
  • GRU is still supported as option

Conditioning

  • Fileid conditioning (use_condition=true, condition_type=file_id)
  • Type1 (setpoint/demand) can be passed as continuous condition (cond_cont)

5. Training Flow / 训练流程

File: example/train.py

5.1 Stage1 Temporal training

  • Use continuous features (excluding Type1/Type5)
  • Teacherforced GRU predicts next step
  • Loss: MSE
  • Output: temporal.pt

5.2 Stage2 Diffusion training

  • Compute residual: x_resid = x_cont - trend
  • Sample time step t
  • Add noise for continuous; mask tokens for discrete
  • Model predicts:
    • eps_pred for continuous residual
    • logits for discrete tokens

Loss design

  • Continuous loss: MSE on eps or x0 (cont_target)
  • Optional weighting: inverse variance (cont_loss_weighting=inv_std)
  • Optional SNR weighting (snr_weighted_loss)
  • Optional quantile loss (align residual distribution)
  • Optional residual mean/std loss
  • Discrete loss: crossentropy on masked tokens
  • Total: loss = λ * loss_cont + (1λ) * loss_disc

6. Sampling & Export / 采样与导出

File: example/export_samples.py

Steps:

  1. Initialize continuous with noise
  2. Initialize discrete with masks
  3. Reverse diffusion loop from t=T..0
  4. Add trend back (if temporal stage enabled)
  5. Inverse transforms (quantile → raw)
  6. Clip/bound if configured
  7. Merge back Type1 (conditioning) and Type5 (derived)
  8. Write generated.csv

7. Evaluation / 评估

File: example/evaluate_generated.py

Metrics

  • KS (tieaware) for continuous
  • JSD for discrete
  • lag1 correlation for temporal consistency
  • quantile diffs, mean/std errors

Important

  • Reference supports glob and aggregates all matched files
  • KS implementation is tieaware (correct for spiky/quantized data)

Outputs:

  • example/results/eval.json

8. Diagnostics / 诊断工具

  • example/diagnose_ks.py: CDF plots and perfeature KS
  • example/ranked_ks.py: ranked KS + contribution
  • example/filtered_metrics.py: filtered KS excluding outliers
  • example/program_stats.py: Type1 stats
  • example/controller_stats.py: Type2 stats
  • example/actuator_stats.py: Type3 stats
  • example/pv_stats.py: Type4 stats
  • example/aux_stats.py: Type6 stats

9. TypeAware Modeling / 类型化分离

To reduce KS dominated by a few variables, the project uses Type categories defined in config:

  • Type1: setpoints / demand (scheduledriven)
  • Type2: controller outputs
  • Type3: actuator positions
  • Type4: PV sensors
  • Type5: derived tags
  • Type6: auxiliary / coupling

Current implementation (diagnostic KS baseline)

File: example/postprocess_types.py

  • Type1/2/3/5/6 → empirical resampling from real distribution
  • Type4 → keep diffusion output

This is not the final model, but provides a KSupper bound for diagnosis.

Outputs:

  • example/results/generated_post.csv
  • example/results/eval_post.json

10. Pipeline / 一键流程

File: example/run_all.py

Default pipeline:

  1. prepare_data
  2. train
  3. export_samples
  4. evaluate_generated (generated.csv)
  5. postprocess_types (generated_post.csv)
  6. evaluate_generated (eval_post.json)
  7. diagnostics scripts

Linux:

python example/run_all.py --device cuda --config example/config.json

Windows (PowerShell):

python run_all.py --device cuda --config config.json

11. Current Configuration (Key Defaults)

From example/config.json:

  • backbone_type: transformer
  • timesteps: 600
  • seq_len: 96
  • batch_size: 16
  • cont_target: x0
  • cont_loss_weighting: inv_std
  • snr_weighted_loss: true
  • quantile_loss_weight: 0.2
  • use_quantile_transform: true
  • cont_post_calibrate: true
  • use_temporal_stage1: true

12. Whats Actually Trained vs Whats PostProcessed

Trained

  • Temporal GRU (trend)
  • Diffusion residual model (continuous + discrete)

PostProcessed (KSonly)

  • Type1/2/3/5/6 replaced by empirical resampling

This is important: postprocess improves KS but may break joint realism.


13. Why Its Still Hard / 当前难点

  • Type1/2/3 are eventdriven and piecewise constant
  • Diffusion (Gaussian DDPM + MSE) tends to smooth/blur these
  • Temporal vs distribution objectives pull in opposite directions

14. Where To Improve Next / 下一步方向

  1. Replace KSonly postprocess with conditional generators:

    • Type1: program generator (HMM / schedule)
    • Type2: controller emulator (PIDlike)
    • Type3: actuator dynamics (dwell + rate + saturation)
  2. Add regime conditioning for Type4 PVs

  3. Joint realism checks (crossfeature correlation)


15. Key Files (Complete but Pruned)

mask-ddpm/
  report.md
  docs/
    README.md
    architecture.md
    evaluation.md
    decisions.md
    experiments.md
    ideas.md
  example/
    config.json
    config_no_temporal.json
    config_temporal_strong.json
    feature_split.json
    data_utils.py
    prepare_data.py
    hybrid_diffusion.py
    train.py
    sample.py
    export_samples.py
    evaluate_generated.py
    run_all.py
    run_compare.py
    diagnose_ks.py
    filtered_metrics.py
    ranked_ks.py
    program_stats.py
    controller_stats.py
    actuator_stats.py
    pv_stats.py
    aux_stats.py
    postprocess_types.py
    results/
      generated.csv
      generated_post.csv
      eval.json
      eval_post.json
      cont_stats.json
      disc_vocab.json
      metrics_history.csv

16. Summary / 总结

The current project is a hybrid diffusion system with a twostage temporal+residual design, built to balance distribution alignment and temporal realism. The architecture is modular, with explicit typeaware diagnostics and postprocessing, and supports both GRU and Transformer backbones. The remaining research challenge is to replace KSonly postprocessing with conditional, structurally consistent generators for Type1/2/3/5/6 features.