We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Developing Maintainable Data Pipelines With Jupyter and Ploomber I PyData Chicago I September Meetup
Discover how to develop maintainable data pipelines with Jupyter and Ploomber, featuring modularized code, scaleable execution, and seamless Git management, ideal for research and industry applications.
- Jupyter Notebooks can be composed of multiple files, not just one.
- Using IPYNB files can be beneficial because they can be run at scale, but using .py files is recommended for easier Git management.
- Plumber allows for modularized code, making collaboration easier.
- The tool can also run .py files as notebooks, creating a copy of the input file.
- Jupyter Notebooks can be used for research and industry deployment.
- The tool can also run .r and SQL scripts.
- Input files can be Jupyter Notebooks, .py files, .r files, and SQL scripts.
- The tool allows for incremental builds, which speeds up the data analysis process.
- The tool can also embed tests and data quality tests.
- The tool provides a GUI-like interface for creating pipelines, making it easy to compose production-ready data workflows.
- The tool can be integrated with workflows like Airflow, AWS Batch, and Kubernetes.
- The tool allows for custom naming of output files.
- The tool makes collaboration easier by allowing data scientists to work independently and then integrate their work.
- The tool can automatically detect dependencies and execute scripts in the correct order.
- The tool can be used for various data science tasks such as data cleaning, data transformation, and model training.