Rewrite report as full user manual
This commit is contained in:
377
report.md
377
report.md
@@ -1,218 +1,239 @@
|
|||||||
# mask-ddpm Project Report (Detailed)
|
# mask-ddpm 项目说明书(完整详细版)
|
||||||
|
|
||||||
This report is a **complete, beginner‑friendly** description of the current project implementation as of the latest code in this repo. It explains **what the project does**, **how data flows**, **what each file is for**, and **why the architecture is designed this way**.
|
> 本文档是“说明书级别”的完整描述,面向首次接触项目的同学。
|
||||||
|
> 目标是让**不了解扩散/时序建模的人**也能理解:项目是什么、怎么跑、每个文件干什么、每一步在训练什么、为什么这么设计。
|
||||||
|
>
|
||||||
|
> 适用范围:当前仓库代码(以 `example/config.json` 为主配置)。
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 0. TL;DR / 一句话概览
|
## 目录
|
||||||
|
1. 项目目标与研究问题
|
||||||
We generate multivariate ICS time‑series by **(1) learning temporal trend with GRU** and **(2) learning residuals with a hybrid diffusion model** (continuous DDPM + discrete masked diffusion). We then evaluate with **tie‑aware KS** and run **Type‑aware postprocessing** for diagnostic KS reduction.
|
2. 数据与特征结构
|
||||||
|
3. 预处理与统计文件
|
||||||
|
4. 模型总体架构
|
||||||
|
5. 训练流程(逐步骤)
|
||||||
|
6. 采样与导出流程
|
||||||
|
7. 评估体系与指标
|
||||||
|
8. 诊断工具与常用脚本
|
||||||
|
9. Type‑aware(按类型分治)设计
|
||||||
|
10. 一键运行与常见命令
|
||||||
|
11. 输出文件说明
|
||||||
|
12. 当前配置与关键超参
|
||||||
|
13. 常见问题与慢的原因
|
||||||
|
14. 已知限制与后续方向
|
||||||
|
15. 文件树(精简版)
|
||||||
|
16. 文件职责(逐文件说明)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 1. Project Goal / 项目目标
|
## 1. 项目目标与研究问题
|
||||||
|
|
||||||
We want synthetic ICS sequences that are:
|
本项目目标:生成工业控制系统(ICS)多变量时序数据,满足以下三点:
|
||||||
1) **Distribution‑aligned** (per‑feature CDF matches real data → low KS)
|
|
||||||
2) **Temporally consistent** (lag‑1 correlation and trend are realistic)
|
|
||||||
3) **Discrete‑valid** (state tokens are legal and frequency‑consistent)
|
|
||||||
|
|
||||||
This is hard because **distribution** and **temporal structure** often conflict in a single model.
|
- **分布一致性**:每个变量的统计分布接近真实(用 KS 衡量)
|
||||||
|
- **时序一致性**:序列结构合理,lag‑1 相关性、趋势符合真实
|
||||||
|
- **离散合法性**:离散变量(状态/模式)必须是合法 token 且分布合理(JSD)
|
||||||
|
|
||||||
|
核心难点:
|
||||||
|
- 时序结构和分布对齐经常相互冲突
|
||||||
|
- 真实数据包含“程序驱动/事件驱动”的变量,难以用纯 DDPM 学好
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2. Data & Feature Schema / 数据与特征结构
|
## 2. 数据与特征结构
|
||||||
|
|
||||||
**Input data**: HAI CSV files (compressed) in `dataset/hai/hai-21.03/`.
|
**数据来源**:HAI `train*.csv.gz`(多文件)
|
||||||
|
|
||||||
**Feature split**: `example/feature_split.json`
|
**特征拆分**(见 `example/feature_split.json`):
|
||||||
- `continuous`: real‑valued sensors/actuators
|
- `continuous`:连续变量(传感器/执行器)
|
||||||
- `discrete`: state tokens / modes
|
- `discrete`:离散变量(状态/模式)
|
||||||
- `time_column`: time index (not trained)
|
- `time_column`:时间列(不参与训练)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3. Preprocessing / 预处理
|
## 3. 预处理与统计文件
|
||||||
|
|
||||||
File: `example/prepare_data.py`
|
脚本:`example/prepare_data.py`
|
||||||
|
|
||||||
### Continuous features
|
### 3.1 连续变量
|
||||||
- Mean/std statistics
|
- 计算 mean/std
|
||||||
- Quantile table (if `use_quantile_transform=true`)
|
- 若开启 `use_quantile_transform`:计算分位数表(CDF)
|
||||||
- Optional transforms (log1p etc.)
|
- 输出:`example/results/cont_stats.json`
|
||||||
- Output: `example/results/cont_stats.json`
|
|
||||||
|
|
||||||
### Discrete features
|
### 3.2 离散变量
|
||||||
- Token vocab from data
|
- 统计 vocab
|
||||||
- Output: `example/results/disc_vocab.json`
|
- 输出:`example/results/disc_vocab.json`
|
||||||
|
|
||||||
File: `example/data_utils.py` contains
|
### 3.3 数据工具
|
||||||
- Normalization / inverse
|
`example/data_utils.py` 提供:
|
||||||
- Quantile transform / inverse
|
- 标准化/反标准化
|
||||||
- Post‑calibration helpers
|
- 分位数变换/逆变换
|
||||||
|
- 可选后校准(quantile calibration)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 4. Architecture / 模型结构
|
## 4. 模型总体架构
|
||||||
|
|
||||||
### 4.1 Stage‑1 Temporal GRU (Trend)
|
本项目采用 **两阶段 + 混合扩散** 架构:
|
||||||
File: `example/hybrid_diffusion.py`
|
|
||||||
- Class: `TemporalGRUGenerator`
|
|
||||||
- Input: continuous sequence
|
|
||||||
- Output: **trend sequence** (teacher forced)
|
|
||||||
- Purpose: capture temporal structure
|
|
||||||
|
|
||||||
### 4.2 Stage‑2 Hybrid Diffusion (Residual)
|
### 4.1 Stage‑1 Temporal GRU
|
||||||
File: `example/hybrid_diffusion.py`
|
- 目的:学习序列趋势、时序结构
|
||||||
|
- 输入:连续变量序列
|
||||||
|
- 输出:trend(趋势序列)
|
||||||
|
|
||||||
**Continuous branch**
|
### 4.2 Stage‑2 Hybrid Diffusion
|
||||||
- Gaussian DDPM
|
- 目的:学习残差分布(把时序和分布解耦)
|
||||||
- Predicts **residual** (or noise)
|
- 连续变量:Gaussian DDPM
|
||||||
|
- 离散变量:mask diffusion 分类 head
|
||||||
|
|
||||||
**Discrete branch**
|
### 4.3 Backbone 选择
|
||||||
- Mask diffusion (masked tokens)
|
- 当前配置:`backbone_type = transformer`
|
||||||
- Classifier head per discrete column
|
- 可选:GRU(更省显存更稳定)
|
||||||
|
|
||||||
**Backbone**
|
|
||||||
- Current config uses **Transformer encoder** (`backbone_type=transformer`)
|
|
||||||
- GRU is still supported as option
|
|
||||||
|
|
||||||
**Conditioning**
|
|
||||||
- File‑id conditioning (`use_condition=true`, `condition_type=file_id`)
|
|
||||||
- Type‑1 (setpoint/demand) can be passed as **continuous condition** (`cond_cont`)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 5. Training Flow / 训练流程
|
## 5. 训练流程(逐步骤)
|
||||||
File: `example/train.py`
|
|
||||||
|
|
||||||
### 5.1 Stage‑1 Temporal training
|
脚本:`example/train.py`
|
||||||
- Use continuous features (excluding Type1/Type5)
|
|
||||||
- Teacher‑forced GRU predicts next step
|
|
||||||
- Loss: **MSE**
|
|
||||||
- Output: `temporal.pt`
|
|
||||||
|
|
||||||
### 5.2 Stage‑2 Diffusion training
|
### Step 1:Temporal 训练
|
||||||
- Compute residual: `x_resid = x_cont - trend`
|
- 输入:连续序列
|
||||||
- Sample time step `t`
|
- GRU teacher‑forcing 预测下一步
|
||||||
- Add noise for continuous; mask tokens for discrete
|
- Loss:MSE
|
||||||
- Model predicts:
|
- 输出:`temporal.pt`
|
||||||
- **eps_pred** for continuous residual
|
|
||||||
- logits for discrete tokens
|
|
||||||
|
|
||||||
### Loss design
|
### Step 2:Diffusion 训练
|
||||||
- Continuous loss: MSE on eps or x0 (`cont_target`)
|
- 计算残差:`x_resid = x_cont - trend`
|
||||||
- Optional weighting: inverse variance (`cont_loss_weighting=inv_std`)
|
- 采样时间步 t
|
||||||
- Optional SNR weighting (`snr_weighted_loss`)
|
- 连续:加噪
|
||||||
- Optional quantile loss (align residual distribution)
|
- 离散:mask token
|
||||||
- Optional residual mean/std loss
|
- 模型预测 eps / logits
|
||||||
- Discrete loss: cross‑entropy on masked tokens
|
|
||||||
- Total: `loss = λ * loss_cont + (1‑λ) * loss_disc`
|
### Loss 设计
|
||||||
|
- Continuous:MSE(eps 或 x0)
|
||||||
|
- Discrete:Cross Entropy(mask 部分)
|
||||||
|
- 总损失:`loss = λ * loss_cont + (1-λ) * loss_disc`
|
||||||
|
- 可选加权:
|
||||||
|
- inverse‑std
|
||||||
|
- SNR‑weighted
|
||||||
|
- quantile loss
|
||||||
|
- residual stat loss
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 6. Sampling & Export / 采样与导出
|
## 6. 采样与导出流程
|
||||||
File: `example/export_samples.py`
|
|
||||||
|
|
||||||
Steps:
|
脚本:`example/export_samples.py`
|
||||||
1) Initialize continuous with noise
|
|
||||||
2) Initialize discrete with masks
|
流程:
|
||||||
3) Reverse diffusion loop from `t=T..0`
|
1) 初始化噪声(连续)
|
||||||
4) Add trend back (if temporal stage enabled)
|
2) 初始化 mask(离散)
|
||||||
5) Inverse transforms (quantile → raw)
|
3) 反扩散 t=T..0
|
||||||
6) Clip/bound if configured
|
4) 加回 trend
|
||||||
7) Merge back Type1 (conditioning) and Type5 (derived)
|
5) 反变换(quantile/标准化)
|
||||||
8) Write `generated.csv`
|
6) 合成 CSV
|
||||||
|
|
||||||
|
输出:`example/results/generated.csv`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 7. Evaluation / 评估
|
## 7. 评估体系与指标
|
||||||
File: `example/evaluate_generated.py`
|
|
||||||
|
|
||||||
### Metrics
|
脚本:`example/evaluate_generated.py`
|
||||||
- **KS (tie‑aware)** for continuous
|
|
||||||
- **JSD** for discrete
|
|
||||||
- **lag‑1 correlation** for temporal consistency
|
|
||||||
- quantile diffs, mean/std errors
|
|
||||||
|
|
||||||
### Important
|
### 连续指标
|
||||||
- Reference supports **glob** and aggregates **all matched files**
|
- **KS(tie‑aware)**
|
||||||
- KS implementation is **tie‑aware** (correct for spiky/quantized data)
|
- quantile diff
|
||||||
|
- lag‑1 correlation
|
||||||
|
|
||||||
Outputs:
|
### 离散指标
|
||||||
- `example/results/eval.json`
|
- JSD
|
||||||
|
- invalid token 比例
|
||||||
|
|
||||||
|
### Reference 读取
|
||||||
|
- 支持 `train*.csv.gz` glob
|
||||||
|
- 自动汇总所有文件
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 8. Diagnostics / 诊断工具
|
## 8. 诊断工具与常用脚本
|
||||||
|
|
||||||
- `example/diagnose_ks.py`: CDF plots and per‑feature KS
|
- `diagnose_ks.py`:CDF 可视化
|
||||||
- `example/ranked_ks.py`: ranked KS + contribution
|
- `ranked_ks.py`:KS 贡献排序
|
||||||
- `example/filtered_metrics.py`: filtered KS excluding outliers
|
- `filtered_metrics.py`:过滤异常特征后的 KS
|
||||||
- `example/program_stats.py`: Type‑1 stats
|
- `program_stats.py`:Type1 统计
|
||||||
- `example/controller_stats.py`: Type‑2 stats
|
- `controller_stats.py`:Type2 统计
|
||||||
- `example/actuator_stats.py`: Type‑3 stats
|
- `actuator_stats.py`:Type3 统计
|
||||||
- `example/pv_stats.py`: Type‑4 stats
|
- `pv_stats.py`:Type4 统计
|
||||||
- `example/aux_stats.py`: Type‑6 stats
|
- `aux_stats.py`:Type6 统计
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Type‑Aware Modeling / 类型化分离
|
## 9. Type‑aware 设计(按类型分治)
|
||||||
|
|
||||||
To reduce KS dominated by a few variables, the project uses **Type categories** defined in config:
|
在真实 ICS 中,部分变量很难用 DDPM 学到,所以做类型划分:
|
||||||
- **Type1**: setpoints / demand (schedule‑driven)
|
|
||||||
- **Type2**: controller outputs
|
|
||||||
- **Type3**: actuator positions
|
|
||||||
- **Type4**: PV sensors
|
|
||||||
- **Type5**: derived tags
|
|
||||||
- **Type6**: auxiliary / coupling
|
|
||||||
|
|
||||||
### Current implementation (diagnostic KS baseline)
|
- **Type1**:setpoint/demand(调度驱动)
|
||||||
File: `example/postprocess_types.py`
|
- **Type2**:controller outputs
|
||||||
- Type1/2/3/5/6 → **empirical resampling** from real distribution
|
- **Type3**:actuator positions
|
||||||
- Type4 → keep diffusion output
|
- **Type4**:PV sensors
|
||||||
|
- **Type5**:derived tags
|
||||||
|
- **Type6**:aux/coupling
|
||||||
|
|
||||||
This is **not** the final model, but provides a **KS‑upper bound** for diagnosis.
|
脚本:`example/postprocess_types.py`
|
||||||
|
|
||||||
Outputs:
|
当前实现是 **KS‑only baseline**:
|
||||||
- `example/results/generated_post.csv`
|
- Type1/2/3/5/6 → 经验重采样
|
||||||
- `example/results/eval_post.json`
|
- Type4 → 仍用 diffusion
|
||||||
|
|
||||||
|
用途:
|
||||||
|
- 快速诊断“KS 最优可达上界”
|
||||||
|
- 不保证联合分布真实性
|
||||||
|
|
||||||
|
输出:`example/results/generated_post.csv`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 10. Pipeline / 一键流程
|
## 10. 一键运行与常见命令
|
||||||
|
|
||||||
File: `example/run_all.py`
|
### 全流程(推荐)
|
||||||
|
|
||||||
Default pipeline:
|
|
||||||
1) prepare_data
|
|
||||||
2) train
|
|
||||||
3) export_samples
|
|
||||||
4) evaluate_generated (generated.csv)
|
|
||||||
5) postprocess_types (generated_post.csv)
|
|
||||||
6) evaluate_generated (eval_post.json)
|
|
||||||
7) diagnostics scripts
|
|
||||||
|
|
||||||
**Linux**:
|
|
||||||
```bash
|
```bash
|
||||||
python example/run_all.py --device cuda --config example/config.json
|
python example/run_all.py --device cuda --config example/config.json
|
||||||
```
|
```
|
||||||
|
|
||||||
**Windows (PowerShell)**:
|
### 只评估不训练
|
||||||
```powershell
|
```bash
|
||||||
python run_all.py --device cuda --config config.json
|
python example/run_all.py --skip-prepare --skip-train --skip-export
|
||||||
|
```
|
||||||
|
|
||||||
|
### 只训练不评估
|
||||||
|
```bash
|
||||||
|
python example/run_all.py --skip-eval --skip-postprocess --skip-post-eval --skip-diagnostics
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 11. Current Configuration (Key Defaults)
|
## 11. 输出文件说明
|
||||||
From `example/config.json`:
|
|
||||||
|
- `generated.csv`:原始 diffusion 输出
|
||||||
|
- `generated_post.csv`:KS‑only 后处理输出
|
||||||
|
- `eval.json`:原始评估
|
||||||
|
- `eval_post.json`:后处理评估
|
||||||
|
- `cont_stats.json` / `disc_vocab.json`:统计文件
|
||||||
|
- `*_stats.json`:Type 统计报告
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. 当前配置(关键超参)
|
||||||
|
|
||||||
|
来自 `example/config.json`:
|
||||||
- backbone_type: **transformer**
|
- backbone_type: **transformer**
|
||||||
- timesteps: 600
|
- timesteps: 600
|
||||||
- seq_len: 96
|
- seq_len: 96
|
||||||
- batch_size: 16
|
- batch_size: 16
|
||||||
- cont_target: `x0`
|
- cont_target: x0
|
||||||
- cont_loss_weighting: `inv_std`
|
- cont_loss_weighting: inv_std
|
||||||
- snr_weighted_loss: true
|
- snr_weighted_loss: true
|
||||||
- quantile_loss_weight: 0.2
|
- quantile_loss_weight: 0.2
|
||||||
- use_quantile_transform: true
|
- use_quantile_transform: true
|
||||||
@@ -221,41 +242,30 @@ From `example/config.json`:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 12. What’s Actually Trained vs What’s Post‑Processed
|
## 13. 为什么运行慢
|
||||||
|
|
||||||
**Trained**
|
1) 两阶段训练(temporal + diffusion)
|
||||||
- Temporal GRU (trend)
|
2) 评估要读全量 train*.csv.gz
|
||||||
- Diffusion residual model (continuous + discrete)
|
3) run_all 默认跑所有诊断脚本
|
||||||
|
4) timesteps / seq_len 大
|
||||||
**Post‑Processed (KS‑only)**
|
|
||||||
- Type1/2/3/5/6 replaced by empirical resampling
|
|
||||||
|
|
||||||
This is important: postprocess improves KS but **may break joint realism**.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 13. Why It’s Still Hard / 当前难点
|
## 14. 已知限制与后续方向
|
||||||
|
|
||||||
- Type1/2/3 are **event‑driven** and **piecewise constant**
|
限制:
|
||||||
- Diffusion (Gaussian DDPM + MSE) tends to smooth/blur these
|
- Type1/2/3 仍主导 KS
|
||||||
- Temporal vs distribution objectives pull in opposite directions
|
- KS‑only baseline 会破坏联合分布
|
||||||
|
- 时序和分布存在 trade‑off
|
||||||
|
|
||||||
|
方向:
|
||||||
|
- 为 Type1/2/3 建条件模型
|
||||||
|
- Type4 增加 regime conditioning
|
||||||
|
- 联合指标(cross‑feature correlation)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 14. Where To Improve Next / 下一步方向
|
## 15. 文件树(精简版)
|
||||||
|
|
||||||
1) Replace KS‑only postprocess with **conditional generators**:
|
|
||||||
- Type1: program generator (HMM / schedule)
|
|
||||||
- Type2: controller emulator (PID‑like)
|
|
||||||
- Type3: actuator dynamics (dwell + rate + saturation)
|
|
||||||
|
|
||||||
2) Add regime conditioning for Type4 PVs
|
|
||||||
|
|
||||||
3) Joint realism checks (cross‑feature correlation)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 15. Key Files (Complete but Pruned)
|
|
||||||
|
|
||||||
```
|
```
|
||||||
mask-ddpm/
|
mask-ddpm/
|
||||||
@@ -291,18 +301,25 @@ mask-ddpm/
|
|||||||
aux_stats.py
|
aux_stats.py
|
||||||
postprocess_types.py
|
postprocess_types.py
|
||||||
results/
|
results/
|
||||||
generated.csv
|
|
||||||
generated_post.csv
|
|
||||||
eval.json
|
|
||||||
eval_post.json
|
|
||||||
cont_stats.json
|
|
||||||
disc_vocab.json
|
|
||||||
metrics_history.csv
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 16. Summary / 总结
|
## 16. 文件职责(逐文件说明)
|
||||||
|
|
||||||
The current project is a **hybrid diffusion system** with a **two‑stage temporal+residual design**, built to balance **distribution alignment** and **temporal realism**. The architecture is modular, with explicit type‑aware diagnostics and postprocessing, and supports both GRU and Transformer backbones. The remaining research challenge is to replace KS‑only postprocessing with **conditional, structurally consistent generators** for Type1/2/3/5/6 features.
|
- `prepare_data.py`:统计连续/离散特征
|
||||||
|
- `data_utils.py`:预处理与变换函数
|
||||||
|
- `hybrid_diffusion.py`:模型主体(Temporal + Diffusion)
|
||||||
|
- `train.py`:两阶段训练
|
||||||
|
- `export_samples.py`:采样导出
|
||||||
|
- `evaluate_generated.py`:评估指标
|
||||||
|
- `run_all.py`:一键流程
|
||||||
|
- `postprocess_types.py`:Type‑aware KS‑only baseline
|
||||||
|
- `diagnose_ks.py`:CDF 诊断
|
||||||
|
- `ranked_ks.py`:KS 排序
|
||||||
|
- `filtered_metrics.py`:过滤 KS
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 结束
|
||||||
|
如果你需要更“论文式”的版本(加入公式、伪代码、实验表格),可以继续追加。
|
||||||
|
|||||||
Reference in New Issue
Block a user