# Evaluation Protocol ## Primary Metrics - **avg_ks**: mean KS across continuous features - **avg_jsd**: mean JSD across discrete feature marginals - **avg_lag1_diff**: lag‑1 correlation mismatch ## Diagnostic Metrics - **per‑feature KS**: `example/diagnose_ks.py` - **filtered KS**: `example/filtered_metrics.py` (remove collapsed/outlier features) - **ranked KS**: `example/ranked_ks.py` (contribution analysis) ## KS Implementation Notes - KS is computed with **tie-aware** CDFs (important for discrete/spiky features). - Reference data supports **glob input** and aggregates all matching files. - Use `--max-rows` to cap reference rows for faster diagnostics. ## Recommended Reporting Report both: 1) **Full metrics** (no filtering) 2) **Filtered metrics** (diagnostic only) Always list which features were filtered. If using KS-only postprocess (empirical resampling), note it explicitly because it can weaken joint realism. ## Program‑Generator Metrics (Type 1) For setpoints/demands: - dwell‑time distribution - change‑count per day - step‑size distribution ## Controller Metrics (Type 2) - saturation ratio near bounds - change rate and median step size ## Actuator Metrics (Type 3) - top‑k spike mass (top1/top3) - unique ratio - dwell length ## PV Metrics (Type 4) - q05/q50/q95 + tail ratio ## Aux Metrics (Type 6) - mean/std - lag‑1 correlation