We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Talks - Juliana Ferreira Alves: Improve Your ML Projects: Embrace Reproducibility and Production...
Learn how to improve machine learning projects with Kedro, a Python framework for building reproducible data pipelines that bridge data science and production.
- 
    Kedro is a Python framework for creating scalable data science pipelines that emphasizes reproducibility and production readiness 
- 
    Key features of Kedro: - Project template with standardized directory structure
- Data catalog for managing data sources and connections
- Pipeline organization with nodes (processing steps)
- Automatic experiment tracking and metrics logging
- Visualization tools like CasualVis for pipeline inspection
 
- 
    Helps bridge the gap between data scientists and ML engineers: - Makes code more production-ready
- Improves communication between team members
- Standardizes project structure
- Makes projects more reproducible and shareable
 
- 
    Supports multiple data sources and environments: - Local files
- Cloud storage (S3, GCP, Azure)
- Hadoop filesystems
- HTTP endpoints
- Multiple runtime environments (dev, prod)
 
- 
    Best practices for using Kedro: - Start experimentation in notebooks
- Move successful experiments to Kedro pipelines
- Use configuration files for parameters
- Store metrics and model artifacts systematically
- Implement version control
- Create Docker containers for deployment
 
- 
    Not meant to replace: - Data infrastructure
- ML ops frameworks
- Other orchestration tools
- Experimentation environments