Emeli Dral: Detecting drift: how to evaluate and explore data drift in machine learning systems

Evaluating and exploring data drift in machine learning systems with Emeli Dral, including detecting drift using sampling, distance tests, parametric and non-parametric tests, and addressing issues with regularization and data quality checks.

Key takeaways

There is no one-size-fits-all approach to detecting data drift, but multiple options are available, including sampling from the data and using distance tests such as the Westerstein distance
Parametric and non-parametric tests can both be useful in detecting data drift, but each has its own strengths and weaknesses
When dealing with large amounts of data, it may be necessary to use non-parametric tests to avoid over-sensitivity and excessive false positives
Multiple approaches can be used to address the issues of data drift, including implementing procedures to detect and respond to alerts, and using techniques such as regularization
Data quality checks, such as monitoring for missing values or correlated data, can be used to identify problems in the data before they lead to inaccuracies in the model

Emeli Dral: Detecting drift: how to evaluate and explore data drift in machine learning systems

More talks