We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Fabian Höring - Building reproducible distributed applications at scale
Learn to build reproducible distributed applications at scale with Python virtual environments, conda, and Apache Yarn.
- Use Python virtual environments to create reproducible applications at scale
- Conda and pip can be used together, but require specific rules to avoid issues
- Use a scheduler like Apache Yarn to manage distributed computing
- Use pickle to serialize Python functions, but beware of issues with non-deterministic behavior
- Cloud Pickle is a library that helps with serialization and deserialization of Python objects
- Spark tries to execute Python tasks, but can struggle with issues like package inconsistencies and version differences
- Use conda to create virtual environments and then upload them to distributed storage
- PECs (Package Environment Cache System) is a tool that helps manage package environments and deployment
- Use a “build-as-you-go” approach to avoid dependency issues and improve reproducibility
- Docker can be used to create consistent environments, but requires careful management of dependencies
- TensorFlow and Dask are examples of tools that can be used in distributed computing
-
The Python
pickle
module is used to serialize Python objects, but can have issues with non-deterministic behavior - Using Python virtual environments and package managers like conda can help improve reproducibility and ease deployment of distributed applications.