Malte Tichy - Paradoxes in model training and evaluation under constraints | PyData Global 2023

Python

Explore how capacity constraints affect ML demand forecasting accuracy. Learn methods to properly model and evaluate truncated data while avoiding bias in predictions.

Key takeaways

When dealing with constrained demand (like limited inventory), sales data alone does not reflect true customer demand, as it’s capped by capacity limits
Don’t equate sales with demand unless you’re certain capacity limits are never reached - censored/truncated data creates bias in model training
Evaluate models by grouping predictions rather than outcomes to avoid selection bias. Group by predicted capacity hit probability rather than actual capacity hits
Account for constraints explicitly in probability distributions and expected values calculations rather than using simplified approximations
Distinguish between unconstrained demand (potential customer interest) and constrained demand (actual sales limited by capacity)
Using constrained sales as an approximation for unconstrained demand leads to systematic underforecasting and increasing stockouts
Forward-looking evaluation (based on predictions) is more meaningful than backward-looking analysis of outcomes
Balance between stockouts and waste requires proper probabilistic modeling of demand under constraints
Tools like statsmodels can help with truncated distribution analysis, though more open source solutions are needed
Clean, controlled test cases should be evaluated before moving to complex real-world scenarios with additional complications

Malte Tichy - Paradoxes in model training and evaluation under constraints | PyData Global 2023

More talks