Malte Tichy - Paradoxes in model training and evaluation under constraints | PyData Global 2023

Explore how capacity constraints affect ML demand forecasting accuracy. Learn methods to properly model and evaluate truncated data while avoiding bias in predictions.

Key takeaways
  • When dealing with constrained demand (like limited inventory), sales data alone does not reflect true customer demand, as it’s capped by capacity limits

  • Don’t equate sales with demand unless you’re certain capacity limits are never reached - censored/truncated data creates bias in model training

  • Evaluate models by grouping predictions rather than outcomes to avoid selection bias. Group by predicted capacity hit probability rather than actual capacity hits

  • Account for constraints explicitly in probability distributions and expected values calculations rather than using simplified approximations

  • Distinguish between unconstrained demand (potential customer interest) and constrained demand (actual sales limited by capacity)

  • Using constrained sales as an approximation for unconstrained demand leads to systematic underforecasting and increasing stockouts

  • Forward-looking evaluation (based on predictions) is more meaningful than backward-looking analysis of outcomes

  • Balance between stockouts and waste requires proper probabilistic modeling of demand under constraints

  • Tools like statsmodels can help with truncated distribution analysis, though more open source solutions are needed

  • Clean, controlled test cases should be evaluated before moving to complex real-world scenarios with additional complications