Files
mask-ddpm/docs/evaluation.md

1.4 KiB
Raw Blame History

Evaluation Protocol

Primary Metrics

  • avg_ks: mean KS across continuous features
  • avg_jsd: mean JSD across discrete feature marginals
  • avg_lag1_diff: lag1 correlation mismatch

Diagnostic Metrics

  • perfeature KS: example/diagnose_ks.py
  • filtered KS: example/filtered_metrics.py (remove collapsed/outlier features)
  • ranked KS: example/ranked_ks.py (contribution analysis)

KS Implementation Notes

  • KS is computed with tie-aware CDFs (important for discrete/spiky features).
  • Reference data supports glob input and aggregates all matching files.
  • Use --max-rows to cap reference rows for faster diagnostics.

Report both:

  1. Full metrics (no filtering)
  2. Filtered metrics (diagnostic only)

Always list which features were filtered. If using KS-only postprocess (empirical resampling), note it explicitly because it can weaken joint realism.

ProgramGenerator Metrics (Type 1)

For setpoints/demands:

  • dwelltime distribution
  • changecount per day
  • stepsize distribution

Controller Metrics (Type 2)

  • saturation ratio near bounds
  • change rate and median step size

Actuator Metrics (Type 3)

  • topk spike mass (top1/top3)
  • unique ratio
  • dwell length

PV Metrics (Type 4)

  • q05/q50/q95 + tail ratio

Aux Metrics (Type 6)

  • mean/std
  • lag1 correlation