Machine Learning with Apache Beam - Danny McCormick, Google - Open Source 101

Learn how to leverage Apache Beam for efficient machine learning with Google's Danny McCormick, covering model evaluation, deployment, and more.

Key takeaways

Machine learning with Apache Beam provides a platform for efficient model evaluation and deployment.
Beam can run on top of Spark, Flink, and other execution engines, providing flexibility and scalability.
Model evaluation is a critical step in the machine learning pipeline, ensuring that models are accurate and reliable.
Beam’s distributed processing capabilities can handle large datasets and perform complex computations efficiently.
Team Finland (TFMA) is a tool that can compare and decide which models are best suited for a given task.
Online training can be challenging in a distributed environment, but Beam’s primitives can help.
Using multiple models in a single pipeline is possible with Beam, allowing for increased accuracy and robustness.
Data validation and pre-processing are crucial steps in the machine learning pipeline, and Beam can assist with these tasks.
Efficient batching of data is important for processing large datasets, and Beam’s primitives can help with this.
Model deployment is critical for making machine learning models available for use, and Beam can aid in this process.
Open-source alternatives like Apache Beam can provide cost-effective and flexible solutions for machine learning tasks.

More talks