309 lines
8.3 KiB
Markdown
309 lines
8.3 KiB
Markdown
# mask-ddpm Project Report (Detailed)
|
||
|
||
This report is a **complete, beginner‑friendly** description of the current project implementation as of the latest code in this repo. It explains **what the project does**, **how data flows**, **what each file is for**, and **why the architecture is designed this way**.
|
||
|
||
---
|
||
|
||
## 0. TL;DR / 一句话概览
|
||
|
||
We generate multivariate ICS time‑series by **(1) learning temporal trend with GRU** and **(2) learning residuals with a hybrid diffusion model** (continuous DDPM + discrete masked diffusion). We then evaluate with **tie‑aware KS** and run **Type‑aware postprocessing** for diagnostic KS reduction.
|
||
|
||
---
|
||
|
||
## 1. Project Goal / 项目目标
|
||
|
||
We want synthetic ICS sequences that are:
|
||
1) **Distribution‑aligned** (per‑feature CDF matches real data → low KS)
|
||
2) **Temporally consistent** (lag‑1 correlation and trend are realistic)
|
||
3) **Discrete‑valid** (state tokens are legal and frequency‑consistent)
|
||
|
||
This is hard because **distribution** and **temporal structure** often conflict in a single model.
|
||
|
||
---
|
||
|
||
## 2. Data & Feature Schema / 数据与特征结构
|
||
|
||
**Input data**: HAI CSV files (compressed) in `dataset/hai/hai-21.03/`.
|
||
|
||
**Feature split**: `example/feature_split.json`
|
||
- `continuous`: real‑valued sensors/actuators
|
||
- `discrete`: state tokens / modes
|
||
- `time_column`: time index (not trained)
|
||
|
||
---
|
||
|
||
## 3. Preprocessing / 预处理
|
||
|
||
File: `example/prepare_data.py`
|
||
|
||
### Continuous features
|
||
- Mean/std statistics
|
||
- Quantile table (if `use_quantile_transform=true`)
|
||
- Optional transforms (log1p etc.)
|
||
- Output: `example/results/cont_stats.json`
|
||
|
||
### Discrete features
|
||
- Token vocab from data
|
||
- Output: `example/results/disc_vocab.json`
|
||
|
||
File: `example/data_utils.py` contains
|
||
- Normalization / inverse
|
||
- Quantile transform / inverse
|
||
- Post‑calibration helpers
|
||
|
||
---
|
||
|
||
## 4. Architecture / 模型结构
|
||
|
||
### 4.1 Stage‑1 Temporal GRU (Trend)
|
||
File: `example/hybrid_diffusion.py`
|
||
- Class: `TemporalGRUGenerator`
|
||
- Input: continuous sequence
|
||
- Output: **trend sequence** (teacher forced)
|
||
- Purpose: capture temporal structure
|
||
|
||
### 4.2 Stage‑2 Hybrid Diffusion (Residual)
|
||
File: `example/hybrid_diffusion.py`
|
||
|
||
**Continuous branch**
|
||
- Gaussian DDPM
|
||
- Predicts **residual** (or noise)
|
||
|
||
**Discrete branch**
|
||
- Mask diffusion (masked tokens)
|
||
- Classifier head per discrete column
|
||
|
||
**Backbone**
|
||
- Current config uses **Transformer encoder** (`backbone_type=transformer`)
|
||
- GRU is still supported as option
|
||
|
||
**Conditioning**
|
||
- File‑id conditioning (`use_condition=true`, `condition_type=file_id`)
|
||
- Type‑1 (setpoint/demand) can be passed as **continuous condition** (`cond_cont`)
|
||
|
||
---
|
||
|
||
## 5. Training Flow / 训练流程
|
||
File: `example/train.py`
|
||
|
||
### 5.1 Stage‑1 Temporal training
|
||
- Use continuous features (excluding Type1/Type5)
|
||
- Teacher‑forced GRU predicts next step
|
||
- Loss: **MSE**
|
||
- Output: `temporal.pt`
|
||
|
||
### 5.2 Stage‑2 Diffusion training
|
||
- Compute residual: `x_resid = x_cont - trend`
|
||
- Sample time step `t`
|
||
- Add noise for continuous; mask tokens for discrete
|
||
- Model predicts:
|
||
- **eps_pred** for continuous residual
|
||
- logits for discrete tokens
|
||
|
||
### Loss design
|
||
- Continuous loss: MSE on eps or x0 (`cont_target`)
|
||
- Optional weighting: inverse variance (`cont_loss_weighting=inv_std`)
|
||
- Optional SNR weighting (`snr_weighted_loss`)
|
||
- Optional quantile loss (align residual distribution)
|
||
- Optional residual mean/std loss
|
||
- Discrete loss: cross‑entropy on masked tokens
|
||
- Total: `loss = λ * loss_cont + (1‑λ) * loss_disc`
|
||
|
||
---
|
||
|
||
## 6. Sampling & Export / 采样与导出
|
||
File: `example/export_samples.py`
|
||
|
||
Steps:
|
||
1) Initialize continuous with noise
|
||
2) Initialize discrete with masks
|
||
3) Reverse diffusion loop from `t=T..0`
|
||
4) Add trend back (if temporal stage enabled)
|
||
5) Inverse transforms (quantile → raw)
|
||
6) Clip/bound if configured
|
||
7) Merge back Type1 (conditioning) and Type5 (derived)
|
||
8) Write `generated.csv`
|
||
|
||
---
|
||
|
||
## 7. Evaluation / 评估
|
||
File: `example/evaluate_generated.py`
|
||
|
||
### Metrics
|
||
- **KS (tie‑aware)** for continuous
|
||
- **JSD** for discrete
|
||
- **lag‑1 correlation** for temporal consistency
|
||
- quantile diffs, mean/std errors
|
||
|
||
### Important
|
||
- Reference supports **glob** and aggregates **all matched files**
|
||
- KS implementation is **tie‑aware** (correct for spiky/quantized data)
|
||
|
||
Outputs:
|
||
- `example/results/eval.json`
|
||
|
||
---
|
||
|
||
## 8. Diagnostics / 诊断工具
|
||
|
||
- `example/diagnose_ks.py`: CDF plots and per‑feature KS
|
||
- `example/ranked_ks.py`: ranked KS + contribution
|
||
- `example/filtered_metrics.py`: filtered KS excluding outliers
|
||
- `example/program_stats.py`: Type‑1 stats
|
||
- `example/controller_stats.py`: Type‑2 stats
|
||
- `example/actuator_stats.py`: Type‑3 stats
|
||
- `example/pv_stats.py`: Type‑4 stats
|
||
- `example/aux_stats.py`: Type‑6 stats
|
||
|
||
---
|
||
|
||
## 9. Type‑Aware Modeling / 类型化分离
|
||
|
||
To reduce KS dominated by a few variables, the project uses **Type categories** defined in config:
|
||
- **Type1**: setpoints / demand (schedule‑driven)
|
||
- **Type2**: controller outputs
|
||
- **Type3**: actuator positions
|
||
- **Type4**: PV sensors
|
||
- **Type5**: derived tags
|
||
- **Type6**: auxiliary / coupling
|
||
|
||
### Current implementation (diagnostic KS baseline)
|
||
File: `example/postprocess_types.py`
|
||
- Type1/2/3/5/6 → **empirical resampling** from real distribution
|
||
- Type4 → keep diffusion output
|
||
|
||
This is **not** the final model, but provides a **KS‑upper bound** for diagnosis.
|
||
|
||
Outputs:
|
||
- `example/results/generated_post.csv`
|
||
- `example/results/eval_post.json`
|
||
|
||
---
|
||
|
||
## 10. Pipeline / 一键流程
|
||
|
||
File: `example/run_all.py`
|
||
|
||
Default pipeline:
|
||
1) prepare_data
|
||
2) train
|
||
3) export_samples
|
||
4) evaluate_generated (generated.csv)
|
||
5) postprocess_types (generated_post.csv)
|
||
6) evaluate_generated (eval_post.json)
|
||
7) diagnostics scripts
|
||
|
||
**Linux**:
|
||
```bash
|
||
python example/run_all.py --device cuda --config example/config.json
|
||
```
|
||
|
||
**Windows (PowerShell)**:
|
||
```powershell
|
||
python run_all.py --device cuda --config config.json
|
||
```
|
||
|
||
---
|
||
|
||
## 11. Current Configuration (Key Defaults)
|
||
From `example/config.json`:
|
||
- backbone_type: **transformer**
|
||
- timesteps: 600
|
||
- seq_len: 96
|
||
- batch_size: 16
|
||
- cont_target: `x0`
|
||
- cont_loss_weighting: `inv_std`
|
||
- snr_weighted_loss: true
|
||
- quantile_loss_weight: 0.2
|
||
- use_quantile_transform: true
|
||
- cont_post_calibrate: true
|
||
- use_temporal_stage1: true
|
||
|
||
---
|
||
|
||
## 12. What’s Actually Trained vs What’s Post‑Processed
|
||
|
||
**Trained**
|
||
- Temporal GRU (trend)
|
||
- Diffusion residual model (continuous + discrete)
|
||
|
||
**Post‑Processed (KS‑only)**
|
||
- Type1/2/3/5/6 replaced by empirical resampling
|
||
|
||
This is important: postprocess improves KS but **may break joint realism**.
|
||
|
||
---
|
||
|
||
## 13. Why It’s Still Hard / 当前难点
|
||
|
||
- Type1/2/3 are **event‑driven** and **piecewise constant**
|
||
- Diffusion (Gaussian DDPM + MSE) tends to smooth/blur these
|
||
- Temporal vs distribution objectives pull in opposite directions
|
||
|
||
---
|
||
|
||
## 14. Where To Improve Next / 下一步方向
|
||
|
||
1) Replace KS‑only postprocess with **conditional generators**:
|
||
- Type1: program generator (HMM / schedule)
|
||
- Type2: controller emulator (PID‑like)
|
||
- Type3: actuator dynamics (dwell + rate + saturation)
|
||
|
||
2) Add regime conditioning for Type4 PVs
|
||
|
||
3) Joint realism checks (cross‑feature correlation)
|
||
|
||
---
|
||
|
||
## 15. Key Files (Complete but Pruned)
|
||
|
||
```
|
||
mask-ddpm/
|
||
report.md
|
||
docs/
|
||
README.md
|
||
architecture.md
|
||
evaluation.md
|
||
decisions.md
|
||
experiments.md
|
||
ideas.md
|
||
example/
|
||
config.json
|
||
config_no_temporal.json
|
||
config_temporal_strong.json
|
||
feature_split.json
|
||
data_utils.py
|
||
prepare_data.py
|
||
hybrid_diffusion.py
|
||
train.py
|
||
sample.py
|
||
export_samples.py
|
||
evaluate_generated.py
|
||
run_all.py
|
||
run_compare.py
|
||
diagnose_ks.py
|
||
filtered_metrics.py
|
||
ranked_ks.py
|
||
program_stats.py
|
||
controller_stats.py
|
||
actuator_stats.py
|
||
pv_stats.py
|
||
aux_stats.py
|
||
postprocess_types.py
|
||
results/
|
||
generated.csv
|
||
generated_post.csv
|
||
eval.json
|
||
eval_post.json
|
||
cont_stats.json
|
||
disc_vocab.json
|
||
metrics_history.csv
|
||
```
|
||
|
||
---
|
||
|
||
## 16. Summary / 总结
|
||
|
||
The current project is a **hybrid diffusion system** with a **two‑stage temporal+residual design**, built to balance **distribution alignment** and **temporal realism**. The architecture is modular, with explicit type‑aware diagnostics and postprocessing, and supports both GRU and Transformer backbones. The remaining research challenge is to replace KS‑only postprocessing with **conditional, structurally consistent generators** for Type1/2/3/5/6 features.
|
||
|