Tackling the Cold Start Challenge in Demand Forecasting [PyCon DE & PyData Berlin 2024]

Python

Learn effective strategies for demand forecasting without historical data, from basic statistical methods to advanced ML approaches using alternative data sources.

Key takeaways

Cold start problem in demand forecasting occurs when there is no historical sales data available for new products, making traditional forecasting methods challenging
Two main backtesting approaches for cold start scenarios:
- Pseudo-cold: Artificially remove historical data to simulate cold starts
- True-cold: Use actual product launch data to evaluate model performance
Simple statistical baselines often perform comparably to sophisticated models:
- Using historical averages across all products
- Taking averages per product category
- Using averages per category and time step after launch
Available data sources to leverage for cold start forecasting:
- Static covariates (product attributes, categories)
- Time-varying covariates (price, weather, calendar features)
- Images and text descriptions
- Similar product histories
- External data (Google Trends)
Model approaches for handling cold starts:
- Global models that learn across all items perform better than individual models
- Dummy padding and masking for neural networks
- Vector embeddings for images and text
- Nearest neighbor imputation using similar products
- Transfer learning from existing products
Evaluation considerations:
- Backtesting should reflect real-world scenarios
- Include simple baselines for comparison
- Consider data availability for true-cold testing
- Stratify test samples appropriately
Model selection depends on:
- Data availability and quality
- Business KPIs and risk tolerance
- Scalability requirements
- Forecast horizon length

Tackling the Cold Start Challenge in Demand Forecasting [PyCon DE & PyData Berlin 2024]

More talks