Emeli Dral: Detecting drift: how to evaluate and explore data drift in machine learning systems

Evaluating and exploring data drift in machine learning systems with Emeli Dral, including detecting drift using sampling, distance tests, parametric and non-parametric tests, and addressing issues with regularization and data quality checks.

Key takeaways
  • There is no one-size-fits-all approach to detecting data drift, but multiple options are available, including sampling from the data and using distance tests such as the Westerstein distance
  • Parametric and non-parametric tests can both be useful in detecting data drift, but each has its own strengths and weaknesses
  • When dealing with large amounts of data, it may be necessary to use non-parametric tests to avoid over-sensitivity and excessive false positives
  • Multiple approaches can be used to address the issues of data drift, including implementing procedures to detect and respond to alerts, and using techniques such as regularization
  • Data quality checks, such as monitoring for missing values or correlated data, can be used to identify problems in the data before they lead to inaccuracies in the model