Files
mask-ddpm/docs/evaluation.md

47 lines
1.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Evaluation Protocol
## Primary Metrics
- **avg_ks**: mean KS across continuous features
- **avg_jsd**: mean JSD across discrete feature marginals
- **avg_lag1_diff**: lag1 correlation mismatch
## Diagnostic Metrics
- **perfeature KS**: `example/diagnose_ks.py`
- **filtered KS**: `example/filtered_metrics.py` (remove collapsed/outlier features)
- **ranked KS**: `example/ranked_ks.py` (contribution analysis)
## KS Implementation Notes
- KS is computed with **tie-aware** CDFs (important for discrete/spiky features).
- Reference data supports **glob input** and aggregates all matching files.
- Use `--max-rows` to cap reference rows for faster diagnostics.
## Recommended Reporting
Report both:
1) **Full metrics** (no filtering)
2) **Filtered metrics** (diagnostic only)
Always list which features were filtered.
If using KS-only postprocess (empirical resampling), note it explicitly because it can weaken joint realism.
## ProgramGenerator Metrics (Type 1)
For setpoints/demands:
- dwelltime distribution
- changecount per day
- stepsize distribution
## Controller Metrics (Type 2)
- saturation ratio near bounds
- change rate and median step size
## Actuator Metrics (Type 3)
- topk spike mass (top1/top3)
- unique ratio
- dwell length
## PV Metrics (Type 4)
- q05/q50/q95 + tail ratio
## Aux Metrics (Type 6)
- mean/std
- lag1 correlation