update

2026-01-22 21:17:11 +08:00
parent 5a109f91ac
commit 178fb7441c
4 changed files with 102 additions and 12 deletions
--- a/example/README.md
+++ b/example/README.md
@@ -66,6 +66,8 @@ python example/run_pipeline.py --device auto
 - Continuous sampling is clipped in normalized space each step for stability.
 - Optional conditioning by file id (`train*.csv.gz`) is enabled by default for multi-file training.
 - Continuous head can be bounded with `tanh` via `use_tanh_eps` in config.
+- Export now clamps continuous features to training min/max and preserves integer/decimal precision.
+- `<UNK>` tokens are replaced by the most frequent token for each discrete column at export.
 - The script only samples the first 5000 rows to stay fast.
 - `prepare_data.py` runs without PyTorch, but `train.py` and `sample.py` require it.
 - `train.py` and `sample.py` auto-select GPU if available; otherwise they fall back to CPU.