Talks - Juliana Ferreira Alves: Improve Your ML Projects: Embrace Reproducibility and Production...

Python Automation Devops

Learn how to improve machine learning projects with Kedro, a Python framework for building reproducible data pipelines that bridge data science and production.

Key takeaways

Kedro is a Python framework for creating scalable data science pipelines that emphasizes reproducibility and production readiness
Key features of Kedro:
- Project template with standardized directory structure
- Data catalog for managing data sources and connections
- Pipeline organization with nodes (processing steps)
- Automatic experiment tracking and metrics logging
- Visualization tools like CasualVis for pipeline inspection
Helps bridge the gap between data scientists and ML engineers:
- Makes code more production-ready
- Improves communication between team members
- Standardizes project structure
- Makes projects more reproducible and shareable
Supports multiple data sources and environments:
- Local files
- Cloud storage (S3, GCP, Azure)
- Hadoop filesystems
- HTTP endpoints
- Multiple runtime environments (dev, prod)
Best practices for using Kedro:
- Start experimentation in notebooks
- Move successful experiments to Kedro pipelines
- Use configuration files for parameters
- Store metrics and model artifacts systematically
- Implement version control
- Create Docker containers for deployment
Not meant to replace:
- Data infrastructure
- ML ops frameworks
- Other orchestration tools
- Experimentation environments

Talks - Juliana Ferreira Alves: Improve Your ML Projects: Embrace Reproducibility and Production...

More talks