Tackling the Cold Start Challenge in Demand Forecasting [PyCon DE & PyData Berlin 2024]

Learn effective strategies for demand forecasting without historical data, from basic statistical methods to advanced ML approaches using alternative data sources.

Key takeaways
  • Cold start problem in demand forecasting occurs when there is no historical sales data available for new products, making traditional forecasting methods challenging

  • Two main backtesting approaches for cold start scenarios:

    • Pseudo-cold: Artificially remove historical data to simulate cold starts
    • True-cold: Use actual product launch data to evaluate model performance
  • Simple statistical baselines often perform comparably to sophisticated models:

    • Using historical averages across all products
    • Taking averages per product category
    • Using averages per category and time step after launch
  • Available data sources to leverage for cold start forecasting:

    • Static covariates (product attributes, categories)
    • Time-varying covariates (price, weather, calendar features)
    • Images and text descriptions
    • Similar product histories
    • External data (Google Trends)
  • Model approaches for handling cold starts:

    • Global models that learn across all items perform better than individual models
    • Dummy padding and masking for neural networks
    • Vector embeddings for images and text
    • Nearest neighbor imputation using similar products
    • Transfer learning from existing products
  • Evaluation considerations:

    • Backtesting should reflect real-world scenarios
    • Include simple baselines for comparison
    • Consider data availability for true-cold testing
    • Stratify test samples appropriately
  • Model selection depends on:

    • Data availability and quality
    • Business KPIs and risk tolerance
    • Scalability requirements
    • Forecast horizon length