This commit is contained in:
2026-01-27 18:39:24 +08:00
parent c46c25d607
commit a24c60c506
22 changed files with 357 additions and 8 deletions

View File

@@ -144,6 +144,7 @@ Defined in `example/data_utils.py` + `example/prepare_data.py`.
Key steps:
- Streaming mean/std/min/max + int-like detection
- Optional **log1p transform** for heavy-tailed continuous columns
- Optional **quantile transform** (TabDDPM-style) for continuous columns
- Discrete vocab + most frequent token
- Windowed batching with **shuffle buffer**
@@ -159,7 +160,8 @@ Export process:
- Diffusion generates residuals
- Output: `trend + residual`
- De-normalize continuous values
- Clamp to observed min/max
- Inverse quantile transform (if enabled)
- Bound to observed min/max (clamp or sigmoid mapping)
- Restore discrete tokens from vocab
- Write to CSV