Rewrite report as full user manual
This commit is contained in:
377
report.md
377
report.md
@@ -1,218 +1,239 @@
|
||||
# mask-ddpm Project Report (Detailed)
|
||||
# mask-ddpm 项目说明书(完整详细版)
|
||||
|
||||
This report is a **complete, beginner‑friendly** description of the current project implementation as of the latest code in this repo. It explains **what the project does**, **how data flows**, **what each file is for**, and **why the architecture is designed this way**.
|
||||
> 本文档是“说明书级别”的完整描述,面向首次接触项目的同学。
|
||||
> 目标是让**不了解扩散/时序建模的人**也能理解:项目是什么、怎么跑、每个文件干什么、每一步在训练什么、为什么这么设计。
|
||||
>
|
||||
> 适用范围:当前仓库代码(以 `example/config.json` 为主配置)。
|
||||
|
||||
---
|
||||
|
||||
## 0. TL;DR / 一句话概览
|
||||
|
||||
We generate multivariate ICS time‑series by **(1) learning temporal trend with GRU** and **(2) learning residuals with a hybrid diffusion model** (continuous DDPM + discrete masked diffusion). We then evaluate with **tie‑aware KS** and run **Type‑aware postprocessing** for diagnostic KS reduction.
|
||||
## 目录
|
||||
1. 项目目标与研究问题
|
||||
2. 数据与特征结构
|
||||
3. 预处理与统计文件
|
||||
4. 模型总体架构
|
||||
5. 训练流程(逐步骤)
|
||||
6. 采样与导出流程
|
||||
7. 评估体系与指标
|
||||
8. 诊断工具与常用脚本
|
||||
9. Type‑aware(按类型分治)设计
|
||||
10. 一键运行与常见命令
|
||||
11. 输出文件说明
|
||||
12. 当前配置与关键超参
|
||||
13. 常见问题与慢的原因
|
||||
14. 已知限制与后续方向
|
||||
15. 文件树(精简版)
|
||||
16. 文件职责(逐文件说明)
|
||||
|
||||
---
|
||||
|
||||
## 1. Project Goal / 项目目标
|
||||
## 1. 项目目标与研究问题
|
||||
|
||||
We want synthetic ICS sequences that are:
|
||||
1) **Distribution‑aligned** (per‑feature CDF matches real data → low KS)
|
||||
2) **Temporally consistent** (lag‑1 correlation and trend are realistic)
|
||||
3) **Discrete‑valid** (state tokens are legal and frequency‑consistent)
|
||||
本项目目标:生成工业控制系统(ICS)多变量时序数据,满足以下三点:
|
||||
|
||||
This is hard because **distribution** and **temporal structure** often conflict in a single model.
|
||||
- **分布一致性**:每个变量的统计分布接近真实(用 KS 衡量)
|
||||
- **时序一致性**:序列结构合理,lag‑1 相关性、趋势符合真实
|
||||
- **离散合法性**:离散变量(状态/模式)必须是合法 token 且分布合理(JSD)
|
||||
|
||||
核心难点:
|
||||
- 时序结构和分布对齐经常相互冲突
|
||||
- 真实数据包含“程序驱动/事件驱动”的变量,难以用纯 DDPM 学好
|
||||
|
||||
---
|
||||
|
||||
## 2. Data & Feature Schema / 数据与特征结构
|
||||
## 2. 数据与特征结构
|
||||
|
||||
**Input data**: HAI CSV files (compressed) in `dataset/hai/hai-21.03/`.
|
||||
**数据来源**:HAI `train*.csv.gz`(多文件)
|
||||
|
||||
**Feature split**: `example/feature_split.json`
|
||||
- `continuous`: real‑valued sensors/actuators
|
||||
- `discrete`: state tokens / modes
|
||||
- `time_column`: time index (not trained)
|
||||
**特征拆分**(见 `example/feature_split.json`):
|
||||
- `continuous`:连续变量(传感器/执行器)
|
||||
- `discrete`:离散变量(状态/模式)
|
||||
- `time_column`:时间列(不参与训练)
|
||||
|
||||
---
|
||||
|
||||
## 3. Preprocessing / 预处理
|
||||
## 3. 预处理与统计文件
|
||||
|
||||
File: `example/prepare_data.py`
|
||||
脚本:`example/prepare_data.py`
|
||||
|
||||
### Continuous features
|
||||
- Mean/std statistics
|
||||
- Quantile table (if `use_quantile_transform=true`)
|
||||
- Optional transforms (log1p etc.)
|
||||
- Output: `example/results/cont_stats.json`
|
||||
### 3.1 连续变量
|
||||
- 计算 mean/std
|
||||
- 若开启 `use_quantile_transform`:计算分位数表(CDF)
|
||||
- 输出:`example/results/cont_stats.json`
|
||||
|
||||
### Discrete features
|
||||
- Token vocab from data
|
||||
- Output: `example/results/disc_vocab.json`
|
||||
### 3.2 离散变量
|
||||
- 统计 vocab
|
||||
- 输出:`example/results/disc_vocab.json`
|
||||
|
||||
File: `example/data_utils.py` contains
|
||||
- Normalization / inverse
|
||||
- Quantile transform / inverse
|
||||
- Post‑calibration helpers
|
||||
### 3.3 数据工具
|
||||
`example/data_utils.py` 提供:
|
||||
- 标准化/反标准化
|
||||
- 分位数变换/逆变换
|
||||
- 可选后校准(quantile calibration)
|
||||
|
||||
---
|
||||
|
||||
## 4. Architecture / 模型结构
|
||||
## 4. 模型总体架构
|
||||
|
||||
### 4.1 Stage‑1 Temporal GRU (Trend)
|
||||
File: `example/hybrid_diffusion.py`
|
||||
- Class: `TemporalGRUGenerator`
|
||||
- Input: continuous sequence
|
||||
- Output: **trend sequence** (teacher forced)
|
||||
- Purpose: capture temporal structure
|
||||
本项目采用 **两阶段 + 混合扩散** 架构:
|
||||
|
||||
### 4.2 Stage‑2 Hybrid Diffusion (Residual)
|
||||
File: `example/hybrid_diffusion.py`
|
||||
### 4.1 Stage‑1 Temporal GRU
|
||||
- 目的:学习序列趋势、时序结构
|
||||
- 输入:连续变量序列
|
||||
- 输出:trend(趋势序列)
|
||||
|
||||
**Continuous branch**
|
||||
- Gaussian DDPM
|
||||
- Predicts **residual** (or noise)
|
||||
### 4.2 Stage‑2 Hybrid Diffusion
|
||||
- 目的:学习残差分布(把时序和分布解耦)
|
||||
- 连续变量:Gaussian DDPM
|
||||
- 离散变量:mask diffusion 分类 head
|
||||
|
||||
**Discrete branch**
|
||||
- Mask diffusion (masked tokens)
|
||||
- Classifier head per discrete column
|
||||
|
||||
**Backbone**
|
||||
- Current config uses **Transformer encoder** (`backbone_type=transformer`)
|
||||
- GRU is still supported as option
|
||||
|
||||
**Conditioning**
|
||||
- File‑id conditioning (`use_condition=true`, `condition_type=file_id`)
|
||||
- Type‑1 (setpoint/demand) can be passed as **continuous condition** (`cond_cont`)
|
||||
### 4.3 Backbone 选择
|
||||
- 当前配置:`backbone_type = transformer`
|
||||
- 可选:GRU(更省显存更稳定)
|
||||
|
||||
---
|
||||
|
||||
## 5. Training Flow / 训练流程
|
||||
File: `example/train.py`
|
||||
## 5. 训练流程(逐步骤)
|
||||
|
||||
### 5.1 Stage‑1 Temporal training
|
||||
- Use continuous features (excluding Type1/Type5)
|
||||
- Teacher‑forced GRU predicts next step
|
||||
- Loss: **MSE**
|
||||
- Output: `temporal.pt`
|
||||
脚本:`example/train.py`
|
||||
|
||||
### 5.2 Stage‑2 Diffusion training
|
||||
- Compute residual: `x_resid = x_cont - trend`
|
||||
- Sample time step `t`
|
||||
- Add noise for continuous; mask tokens for discrete
|
||||
- Model predicts:
|
||||
- **eps_pred** for continuous residual
|
||||
- logits for discrete tokens
|
||||
### Step 1:Temporal 训练
|
||||
- 输入:连续序列
|
||||
- GRU teacher‑forcing 预测下一步
|
||||
- Loss:MSE
|
||||
- 输出:`temporal.pt`
|
||||
|
||||
### Loss design
|
||||
- Continuous loss: MSE on eps or x0 (`cont_target`)
|
||||
- Optional weighting: inverse variance (`cont_loss_weighting=inv_std`)
|
||||
- Optional SNR weighting (`snr_weighted_loss`)
|
||||
- Optional quantile loss (align residual distribution)
|
||||
- Optional residual mean/std loss
|
||||
- Discrete loss: cross‑entropy on masked tokens
|
||||
- Total: `loss = λ * loss_cont + (1‑λ) * loss_disc`
|
||||
### Step 2:Diffusion 训练
|
||||
- 计算残差:`x_resid = x_cont - trend`
|
||||
- 采样时间步 t
|
||||
- 连续:加噪
|
||||
- 离散:mask token
|
||||
- 模型预测 eps / logits
|
||||
|
||||
### Loss 设计
|
||||
- Continuous:MSE(eps 或 x0)
|
||||
- Discrete:Cross Entropy(mask 部分)
|
||||
- 总损失:`loss = λ * loss_cont + (1-λ) * loss_disc`
|
||||
- 可选加权:
|
||||
- inverse‑std
|
||||
- SNR‑weighted
|
||||
- quantile loss
|
||||
- residual stat loss
|
||||
|
||||
---
|
||||
|
||||
## 6. Sampling & Export / 采样与导出
|
||||
File: `example/export_samples.py`
|
||||
## 6. 采样与导出流程
|
||||
|
||||
Steps:
|
||||
1) Initialize continuous with noise
|
||||
2) Initialize discrete with masks
|
||||
3) Reverse diffusion loop from `t=T..0`
|
||||
4) Add trend back (if temporal stage enabled)
|
||||
5) Inverse transforms (quantile → raw)
|
||||
6) Clip/bound if configured
|
||||
7) Merge back Type1 (conditioning) and Type5 (derived)
|
||||
8) Write `generated.csv`
|
||||
脚本:`example/export_samples.py`
|
||||
|
||||
流程:
|
||||
1) 初始化噪声(连续)
|
||||
2) 初始化 mask(离散)
|
||||
3) 反扩散 t=T..0
|
||||
4) 加回 trend
|
||||
5) 反变换(quantile/标准化)
|
||||
6) 合成 CSV
|
||||
|
||||
输出:`example/results/generated.csv`
|
||||
|
||||
---
|
||||
|
||||
## 7. Evaluation / 评估
|
||||
File: `example/evaluate_generated.py`
|
||||
## 7. 评估体系与指标
|
||||
|
||||
### Metrics
|
||||
- **KS (tie‑aware)** for continuous
|
||||
- **JSD** for discrete
|
||||
- **lag‑1 correlation** for temporal consistency
|
||||
- quantile diffs, mean/std errors
|
||||
脚本:`example/evaluate_generated.py`
|
||||
|
||||
### Important
|
||||
- Reference supports **glob** and aggregates **all matched files**
|
||||
- KS implementation is **tie‑aware** (correct for spiky/quantized data)
|
||||
### 连续指标
|
||||
- **KS(tie‑aware)**
|
||||
- quantile diff
|
||||
- lag‑1 correlation
|
||||
|
||||
Outputs:
|
||||
- `example/results/eval.json`
|
||||
### 离散指标
|
||||
- JSD
|
||||
- invalid token 比例
|
||||
|
||||
### Reference 读取
|
||||
- 支持 `train*.csv.gz` glob
|
||||
- 自动汇总所有文件
|
||||
|
||||
---
|
||||
|
||||
## 8. Diagnostics / 诊断工具
|
||||
## 8. 诊断工具与常用脚本
|
||||
|
||||
- `example/diagnose_ks.py`: CDF plots and per‑feature KS
|
||||
- `example/ranked_ks.py`: ranked KS + contribution
|
||||
- `example/filtered_metrics.py`: filtered KS excluding outliers
|
||||
- `example/program_stats.py`: Type‑1 stats
|
||||
- `example/controller_stats.py`: Type‑2 stats
|
||||
- `example/actuator_stats.py`: Type‑3 stats
|
||||
- `example/pv_stats.py`: Type‑4 stats
|
||||
- `example/aux_stats.py`: Type‑6 stats
|
||||
- `diagnose_ks.py`:CDF 可视化
|
||||
- `ranked_ks.py`:KS 贡献排序
|
||||
- `filtered_metrics.py`:过滤异常特征后的 KS
|
||||
- `program_stats.py`:Type1 统计
|
||||
- `controller_stats.py`:Type2 统计
|
||||
- `actuator_stats.py`:Type3 统计
|
||||
- `pv_stats.py`:Type4 统计
|
||||
- `aux_stats.py`:Type6 统计
|
||||
|
||||
---
|
||||
|
||||
## 9. Type‑Aware Modeling / 类型化分离
|
||||
## 9. Type‑aware 设计(按类型分治)
|
||||
|
||||
To reduce KS dominated by a few variables, the project uses **Type categories** defined in config:
|
||||
- **Type1**: setpoints / demand (schedule‑driven)
|
||||
- **Type2**: controller outputs
|
||||
- **Type3**: actuator positions
|
||||
- **Type4**: PV sensors
|
||||
- **Type5**: derived tags
|
||||
- **Type6**: auxiliary / coupling
|
||||
在真实 ICS 中,部分变量很难用 DDPM 学到,所以做类型划分:
|
||||
|
||||
### Current implementation (diagnostic KS baseline)
|
||||
File: `example/postprocess_types.py`
|
||||
- Type1/2/3/5/6 → **empirical resampling** from real distribution
|
||||
- Type4 → keep diffusion output
|
||||
- **Type1**:setpoint/demand(调度驱动)
|
||||
- **Type2**:controller outputs
|
||||
- **Type3**:actuator positions
|
||||
- **Type4**:PV sensors
|
||||
- **Type5**:derived tags
|
||||
- **Type6**:aux/coupling
|
||||
|
||||
This is **not** the final model, but provides a **KS‑upper bound** for diagnosis.
|
||||
脚本:`example/postprocess_types.py`
|
||||
|
||||
Outputs:
|
||||
- `example/results/generated_post.csv`
|
||||
- `example/results/eval_post.json`
|
||||
当前实现是 **KS‑only baseline**:
|
||||
- Type1/2/3/5/6 → 经验重采样
|
||||
- Type4 → 仍用 diffusion
|
||||
|
||||
用途:
|
||||
- 快速诊断“KS 最优可达上界”
|
||||
- 不保证联合分布真实性
|
||||
|
||||
输出:`example/results/generated_post.csv`
|
||||
|
||||
---
|
||||
|
||||
## 10. Pipeline / 一键流程
|
||||
## 10. 一键运行与常见命令
|
||||
|
||||
File: `example/run_all.py`
|
||||
|
||||
Default pipeline:
|
||||
1) prepare_data
|
||||
2) train
|
||||
3) export_samples
|
||||
4) evaluate_generated (generated.csv)
|
||||
5) postprocess_types (generated_post.csv)
|
||||
6) evaluate_generated (eval_post.json)
|
||||
7) diagnostics scripts
|
||||
|
||||
**Linux**:
|
||||
### 全流程(推荐)
|
||||
```bash
|
||||
python example/run_all.py --device cuda --config example/config.json
|
||||
```
|
||||
|
||||
**Windows (PowerShell)**:
|
||||
```powershell
|
||||
python run_all.py --device cuda --config config.json
|
||||
### 只评估不训练
|
||||
```bash
|
||||
python example/run_all.py --skip-prepare --skip-train --skip-export
|
||||
```
|
||||
|
||||
### 只训练不评估
|
||||
```bash
|
||||
python example/run_all.py --skip-eval --skip-postprocess --skip-post-eval --skip-diagnostics
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Current Configuration (Key Defaults)
|
||||
From `example/config.json`:
|
||||
## 11. 输出文件说明
|
||||
|
||||
- `generated.csv`:原始 diffusion 输出
|
||||
- `generated_post.csv`:KS‑only 后处理输出
|
||||
- `eval.json`:原始评估
|
||||
- `eval_post.json`:后处理评估
|
||||
- `cont_stats.json` / `disc_vocab.json`:统计文件
|
||||
- `*_stats.json`:Type 统计报告
|
||||
|
||||
---
|
||||
|
||||
## 12. 当前配置(关键超参)
|
||||
|
||||
来自 `example/config.json`:
|
||||
- backbone_type: **transformer**
|
||||
- timesteps: 600
|
||||
- seq_len: 96
|
||||
- batch_size: 16
|
||||
- cont_target: `x0`
|
||||
- cont_loss_weighting: `inv_std`
|
||||
- cont_target: x0
|
||||
- cont_loss_weighting: inv_std
|
||||
- snr_weighted_loss: true
|
||||
- quantile_loss_weight: 0.2
|
||||
- use_quantile_transform: true
|
||||
@@ -221,41 +242,30 @@ From `example/config.json`:
|
||||
|
||||
---
|
||||
|
||||
## 12. What’s Actually Trained vs What’s Post‑Processed
|
||||
## 13. 为什么运行慢
|
||||
|
||||
**Trained**
|
||||
- Temporal GRU (trend)
|
||||
- Diffusion residual model (continuous + discrete)
|
||||
|
||||
**Post‑Processed (KS‑only)**
|
||||
- Type1/2/3/5/6 replaced by empirical resampling
|
||||
|
||||
This is important: postprocess improves KS but **may break joint realism**.
|
||||
1) 两阶段训练(temporal + diffusion)
|
||||
2) 评估要读全量 train*.csv.gz
|
||||
3) run_all 默认跑所有诊断脚本
|
||||
4) timesteps / seq_len 大
|
||||
|
||||
---
|
||||
|
||||
## 13. Why It’s Still Hard / 当前难点
|
||||
## 14. 已知限制与后续方向
|
||||
|
||||
- Type1/2/3 are **event‑driven** and **piecewise constant**
|
||||
- Diffusion (Gaussian DDPM + MSE) tends to smooth/blur these
|
||||
- Temporal vs distribution objectives pull in opposite directions
|
||||
限制:
|
||||
- Type1/2/3 仍主导 KS
|
||||
- KS‑only baseline 会破坏联合分布
|
||||
- 时序和分布存在 trade‑off
|
||||
|
||||
方向:
|
||||
- 为 Type1/2/3 建条件模型
|
||||
- Type4 增加 regime conditioning
|
||||
- 联合指标(cross‑feature correlation)
|
||||
|
||||
---
|
||||
|
||||
## 14. Where To Improve Next / 下一步方向
|
||||
|
||||
1) Replace KS‑only postprocess with **conditional generators**:
|
||||
- Type1: program generator (HMM / schedule)
|
||||
- Type2: controller emulator (PID‑like)
|
||||
- Type3: actuator dynamics (dwell + rate + saturation)
|
||||
|
||||
2) Add regime conditioning for Type4 PVs
|
||||
|
||||
3) Joint realism checks (cross‑feature correlation)
|
||||
|
||||
---
|
||||
|
||||
## 15. Key Files (Complete but Pruned)
|
||||
## 15. 文件树(精简版)
|
||||
|
||||
```
|
||||
mask-ddpm/
|
||||
@@ -291,18 +301,25 @@ mask-ddpm/
|
||||
aux_stats.py
|
||||
postprocess_types.py
|
||||
results/
|
||||
generated.csv
|
||||
generated_post.csv
|
||||
eval.json
|
||||
eval_post.json
|
||||
cont_stats.json
|
||||
disc_vocab.json
|
||||
metrics_history.csv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 16. Summary / 总结
|
||||
## 16. 文件职责(逐文件说明)
|
||||
|
||||
The current project is a **hybrid diffusion system** with a **two‑stage temporal+residual design**, built to balance **distribution alignment** and **temporal realism**. The architecture is modular, with explicit type‑aware diagnostics and postprocessing, and supports both GRU and Transformer backbones. The remaining research challenge is to replace KS‑only postprocessing with **conditional, structurally consistent generators** for Type1/2/3/5/6 features.
|
||||
- `prepare_data.py`:统计连续/离散特征
|
||||
- `data_utils.py`:预处理与变换函数
|
||||
- `hybrid_diffusion.py`:模型主体(Temporal + Diffusion)
|
||||
- `train.py`:两阶段训练
|
||||
- `export_samples.py`:采样导出
|
||||
- `evaluate_generated.py`:评估指标
|
||||
- `run_all.py`:一键流程
|
||||
- `postprocess_types.py`:Type‑aware KS‑only baseline
|
||||
- `diagnose_ks.py`:CDF 诊断
|
||||
- `ranked_ks.py`:KS 排序
|
||||
- `filtered_metrics.py`:过滤 KS
|
||||
|
||||
---
|
||||
|
||||
# 结束
|
||||
如果你需要更“论文式”的版本(加入公式、伪代码、实验表格),可以继续追加。
|
||||
|
||||
Reference in New Issue
Block a user