Clean artifacts and update example pipeline

2026-01-22 16:32:51 +08:00
parent c0639386be
commit c3f750cd9d
20 changed files with 651 additions and 30826 deletions
--- a/example/README.md
+++ b/example/README.md
@@ -12,33 +12,54 @@ CSV (train1) and produces a continuous/discrete split using a simple heuristic.
 - train_stub.py: end-to-end scaffold for loss computation.
 - train.py: minimal training loop with checkpoints.
 - sample.py: minimal sampling loop.
+- export_samples.py: sample + export to CSV with original column names.
+- evaluate_generated.py: basic eval of generated CSV vs training stats.
+- config.json: training defaults for train.py.
 - model_design.md: step-by-step design notes.
 - results/feature_split.txt: comma-separated feature lists.
 - results/summary.txt: basic stats (rows sampled, column counts).

 ## Run
 ```
-python /home/anay/Dev/diffusion/mask-ddpm/example/analyze_hai21_03.py
+python example/analyze_hai21_03.py
 ```

 Prepare vocab + stats (writes to `example/results`):
 ```
-python /home/anay/Dev/diffusion/mask-ddpm/example/prepare_data.py
+python example/prepare_data.py
 ```

 Train a small run:
 ```
-python /home/anay/Dev/diffusion/mask-ddpm/example/train.py
+python example/train.py --config example/config.json
 ```

 Sample from the trained model:
 ```
-python /home/anay/Dev/diffusion/mask-ddpm/example/sample.py
+python example/sample.py
+```
+
+Sample and export CSV:
+```
+python example/export_samples.py --include-time --device cpu
+```
+
+Evaluate generated CSV (writes eval.json):
+```
+python example/evaluate_generated.py
+```
+
+One-click pipeline (prepare -> train -> export -> eval -> plot):
+```
+python example/run_pipeline.py --device auto
 ```

 ## Notes
 - Heuristic: integer-like values with low cardinality (<=10) are treated as
  discrete. All other numeric columns are continuous.
+- Set `device` in `example/config.json` to `auto` or `cuda` when moving to a GPU machine.
+- Attack label columns (`attack*`) are excluded from training and generation.
+- `time` column is always excluded from training and generation (optional for export only).
 - The script only samples the first 5000 rows to stay fast.
 - `prepare_data.py` runs without PyTorch, but `train.py` and `sample.py` require it.
 - `train.py` and `sample.py` auto-select GPU if available; otherwise they fall back to CPU.