连续型特征在时许相关性上的不足
This commit is contained in:
@@ -67,6 +67,7 @@ python example/run_pipeline.py --device auto
|
||||
- Optional conditioning by file id (`train*.csv.gz`) is enabled by default for multi-file training.
|
||||
- Continuous head can be bounded with `tanh` via `use_tanh_eps` in config.
|
||||
- Export now clamps continuous features to training min/max and preserves integer/decimal precision.
|
||||
- Continuous features may be log1p-transformed automatically for heavy-tailed columns (see cont_stats.json).
|
||||
- `<UNK>` tokens are replaced by the most frequent token for each discrete column at export.
|
||||
- The script only samples the first 5000 rows to stay fast.
|
||||
- `prepare_data.py` runs without PyTorch, but `train.py` and `sample.py` require it.
|
||||
|
||||
Reference in New Issue
Block a user