Time series anomaly detection with a human-in-the-loop [PyCon DE & PyData Berlin 2024]

Python Automation Devops

Learn how to combine machine learning with domain expertise for time series anomaly detection using Label Studio, Azure ML, and Python-based automation for efficient expert validation.

Key takeaways

Domain expert knowledge is crucial for time series anomaly detection - algorithms alone aren’t enough without human validation
Label Studio serves as the core tool for expert feedback, offering:
- Easy-to-use web interface for anomaly review
- Webhook capabilities for automation
- Support for multiple data formats
- Programmatic interaction options
System architecture combines:
- Data ingestion pipeline
- Pre-processing pipeline
- Anomaly detection pipeline
- Azure DevOps for orchestration
- Azure Machine Learning Studio for ML workloads
Automated workflow:
- Starts with unsupervised ML to identify potential anomalies
- Presents candidates to domain experts for validation
- Incorporates feedback to improve future detection
- Runs in batches rather than real-time
Key implementation goals:
- Minimize expert time investment
- Provide reusable and scalable tooling
- Enable quick iteration on models
- Support flexible choice of methods
- Create labeled datasets for future use cases
System design priorities:
- Python-based implementation (~90% Python code)
- Modular architecture for reusability
- Easy-to-use interfaces for domain experts
- Automated infrastructure setup via Terraform
- Integration with existing Azure services
Focus on practical industrial applications rather than theoretical approaches, with emphasis on getting value from data quickly rather than spending months on initial data labeling

Time series anomaly detection with a human-in-the-loop [PyCon DE & PyData Berlin 2024]

More talks