We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Talks - Juliana Ferreira Alves: Improve Your ML Projects: Embrace Reproducibility and Production...
Learn how to improve machine learning projects with Kedro, a Python framework for building reproducible data pipelines that bridge data science and production.
-
Kedro is a Python framework for creating scalable data science pipelines that emphasizes reproducibility and production readiness
-
Key features of Kedro:
- Project template with standardized directory structure
- Data catalog for managing data sources and connections
- Pipeline organization with nodes (processing steps)
- Automatic experiment tracking and metrics logging
- Visualization tools like CasualVis for pipeline inspection
-
Helps bridge the gap between data scientists and ML engineers:
- Makes code more production-ready
- Improves communication between team members
- Standardizes project structure
- Makes projects more reproducible and shareable
-
Supports multiple data sources and environments:
- Local files
- Cloud storage (S3, GCP, Azure)
- Hadoop filesystems
- HTTP endpoints
- Multiple runtime environments (dev, prod)
-
Best practices for using Kedro:
- Start experimentation in notebooks
- Move successful experiments to Kedro pipelines
- Use configuration files for parameters
- Store metrics and model artifacts systematically
- Implement version control
- Create Docker containers for deployment
-
Not meant to replace:
- Data infrastructure
- ML ops frameworks
- Other orchestration tools
- Experimentation environments