One analysis a day keeps anomalies away! — Madalina Ciortan

Python Automation

Learn practical tips for anomaly detection using Python libraries like PyOD & PiSAD. Explore validation techniques, thresholding methods & root cause analysis for real-world applications.

Key takeaways

Most anomaly detection work happens in unsupervised settings with unlabeled data, making it technically challenging to validate results and set appropriate thresholds
PyOD is recommended as a primary library for anomaly detection, offering 50+ algorithms for tabular data and extensions for time series and graph data
For streaming/real-time anomaly detection, the PiSAD library provides 16 dedicated algorithms and assemblers, with Lightweight Online Detector and Xtreme being highlighted solutions
Time series data can be transformed into tabular format using feature engineering (e.g., TSFresh) to leverage the extensive research and methods available for tabular data
Evaluation metrics should focus on extreme class imbalance measures like AUC, F1 score, precision, and recall due to the rare nature of anomalies
ADBench provides 57 benchmark datasets across various industries for testing anomaly detection approaches
Root cause analysis and causal discovery (using libraries like causal-learn and Duy) help understand why anomalies occur
LLMs underperform traditional anomaly detection methods by approximately 30% according to recent research
Dynamic thresholding (PyThresh library with 30 algorithms) can help automatically adjust anomaly thresholds over time
For multivariate time series analysis, the darts library is recommended with its 40+ forecasting algorithms

One analysis a day keeps anomalies away! — Madalina Ciortan

More talks