We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Pavithra Eswaramoorthy & Jaime Rodríguez-Guerra - Ensuring Runtime Reproducibility in Python
Learn key strategies for Python runtime reproducibility, from environment management to risk mitigation. Explore tools like Conda, Docker & best practices for reliable code execution.
-
Reproducibility requires proactive planning and cannot be an afterthought - it needs to be modeled early in the development process
-
Runtime reproducibility framework consists of 4 key steps:
- Define objectives and scope
- Enumerate components
- Evaluate threats/risks
- Apply mitigation measures
-
Key considerations for reproducibility:
- Explicit source of packages and Python interpreters
- Platform OS specifications
- Hardware requirements
- Dependency versions
- Data storage locations
- Infrastructure components
-
Best practices include:
- Using version control
- Creating environment files (environment.yml)
- Generating dependency logs
- Building in redundancy for critical components
- Using internal mirrors when needed
- Restricting channels/versions as appropriate
-
Tools and approaches:
- Conda/Mamba for environment management
- Docker containers for isolation
- Virtual machines for full system reproducibility
- CondaStore for simplified environment management
- Watermark for tracking runtime details
-
Workflows must enable and encourage reproducibility - if the process is too complex, users will default to less reproducible patterns
-
Different levels of reproducibility exist - teams need to consciously decide what level is appropriate for their needs and accept associated risks
-
Reproducibility challenges in data science are complicated by:
- Fast-moving ecosystem
- Multiple packaging systems
- Non-pure Python dependencies
- Hardware/OS variations
-
Being explicit about limitations and requirements helps manage expectations:
- Supported operating systems
- Tested configurations
- Required dependencies
- License considerations
-
Documentation should include complete installation steps, execution procedures, and all runtime details needed for reproduction