update
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# Example: HAI 21.03 Feature Split
|
||||
|
||||
This folder contains a small, reproducible example that inspects the HAI 21.03
|
||||
CSV (train1) and produces a continuous/discrete split using a simple heuristic.
|
||||
CSV (all train*.csv.gz files) and produces a continuous/discrete split using a simple heuristic.
|
||||
|
||||
## Files
|
||||
- analyze_hai21_03.py: reads a sample of the data and writes results.
|
||||
@@ -60,6 +60,12 @@ python example/run_pipeline.py --device auto
|
||||
- Set `device` in `example/config.json` to `auto` or `cuda` when moving to a GPU machine.
|
||||
- Attack label columns (`attack*`) are excluded from training and generation.
|
||||
- `time` column is always excluded from training and generation (optional for export only).
|
||||
- EMA weights are saved as `model_ema.pt` and used by the pipeline for sampling.
|
||||
- Gradients are clipped by default (`grad_clip` in `config.json`) to stabilize training.
|
||||
- Discrete masking uses a cosine schedule for smoother corruption.
|
||||
- Continuous sampling is clipped in normalized space each step for stability.
|
||||
- Optional conditioning by file id (`train*.csv.gz`) is enabled by default for multi-file training.
|
||||
- Continuous head can be bounded with `tanh` via `use_tanh_eps` in config.
|
||||
- The script only samples the first 5000 rows to stay fast.
|
||||
- `prepare_data.py` runs without PyTorch, but `train.py` and `sample.py` require it.
|
||||
- `train.py` and `sample.py` auto-select GPU if available; otherwise they fall back to CPU.
|
||||
|
||||
Reference in New Issue
Block a user