Time series anomaly detection with a human-in-the-loop [PyCon DE & PyData Berlin 2024]

Learn how to combine machine learning with domain expertise for time series anomaly detection using Label Studio, Azure ML, and Python-based automation for efficient expert validation.

Key takeaways
  • Domain expert knowledge is crucial for time series anomaly detection - algorithms alone aren’t enough without human validation

  • Label Studio serves as the core tool for expert feedback, offering:

    • Easy-to-use web interface for anomaly review
    • Webhook capabilities for automation
    • Support for multiple data formats
    • Programmatic interaction options
  • System architecture combines:

    • Data ingestion pipeline
    • Pre-processing pipeline
    • Anomaly detection pipeline
    • Azure DevOps for orchestration
    • Azure Machine Learning Studio for ML workloads
  • Automated workflow:

    • Starts with unsupervised ML to identify potential anomalies
    • Presents candidates to domain experts for validation
    • Incorporates feedback to improve future detection
    • Runs in batches rather than real-time
  • Key implementation goals:

    • Minimize expert time investment
    • Provide reusable and scalable tooling
    • Enable quick iteration on models
    • Support flexible choice of methods
    • Create labeled datasets for future use cases
  • System design priorities:

    • Python-based implementation (~90% Python code)
    • Modular architecture for reusability
    • Easy-to-use interfaces for domain experts
    • Automated infrastructure setup via Terraform
    • Integration with existing Azure services
  • Focus on practical industrial applications rather than theoretical approaches, with emphasis on getting value from data quickly rather than spending months on initial data labeling