Add full quantile stats and post-hoc calibration

This commit is contained in:
2026-01-28 00:52:42 +08:00
parent 6d5c5fffb1
commit c68a6e3c97
9 changed files with 91 additions and 49 deletions

View File

@@ -145,6 +145,7 @@ Key steps:
- Streaming mean/std/min/max + int-like detection
- Optional **log1p transform** for heavy-tailed continuous columns
- Optional **quantile transform** (TabDDPM-style) for continuous columns (skips extra standardization)
- Optional **post-hoc quantile calibration** to align 1D CDFs after sampling
- Discrete vocab + most frequent token
- Windowed batching with **shuffle buffer**
@@ -161,7 +162,8 @@ Export process:
- Output: `trend + residual`
- De-normalize continuous values
- Inverse quantile transform (if enabled; no extra de-standardization)
- Bound to observed min/max (clamp or sigmoid mapping)
- Optional post-hoc quantile calibration (if enabled)
- Bound to observed min/max (clamp / sigmoid / soft_tanh / none)
- Restore discrete tokens from vocab
- Write to CSV