We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Tackling the Cold Start Challenge in Demand Forecasting [PyCon DE & PyData Berlin 2024]
Learn effective strategies for demand forecasting without historical data, from basic statistical methods to advanced ML approaches using alternative data sources.
-
Cold start problem in demand forecasting occurs when there is no historical sales data available for new products, making traditional forecasting methods challenging
-
Two main backtesting approaches for cold start scenarios:
- Pseudo-cold: Artificially remove historical data to simulate cold starts
- True-cold: Use actual product launch data to evaluate model performance
-
Simple statistical baselines often perform comparably to sophisticated models:
- Using historical averages across all products
- Taking averages per product category
- Using averages per category and time step after launch
-
Available data sources to leverage for cold start forecasting:
- Static covariates (product attributes, categories)
- Time-varying covariates (price, weather, calendar features)
- Images and text descriptions
- Similar product histories
- External data (Google Trends)
-
Model approaches for handling cold starts:
- Global models that learn across all items perform better than individual models
- Dummy padding and masking for neural networks
- Vector embeddings for images and text
- Nearest neighbor imputation using similar products
- Transfer learning from existing products
-
Evaluation considerations:
- Backtesting should reflect real-world scenarios
- Include simple baselines for comparison
- Consider data availability for true-cold testing
- Stratify test samples appropriately
-
Model selection depends on:
- Data availability and quality
- Business KPIs and risk tolerance
- Scalability requirements
- Forecast horizon length