PyData Chicago: Running Notebooks in Production? Blessing or Curse? by Eduardo Blancas

"Explore the benefits and challenges of using Jupyter notebooks in production. Discover best practices for maintaining and testing notebooks, managing versioning, and scheduling notebooks. By Eduardo Blancas at PyData Chicago."

Key takeaways
  1. Jupyter is a file format (JSON) and a platform with features allowing editing of notebooks.
  2. Maintaining long notebooks can be challenging, but keeping the code clear and concise with a standard structure helps maintainability.
  3. Having immutable code that doesn’t change arguments is a way to avoid potential issues with notebooks in production.
  4. Following a common standard for formatting, such as Black, can improve readability and maintainability.
  5. Testing notebooks can be challenging, but libraries such as testbook can extract and test functions from notebooks.
  6. Data pipelines can help break down data analysis logic into manageable pieces and test them in isolation.
  7. External libraries used in notebooks should be listed and recorded to prevent versioning issues.
  8. Data snapshots can help keep track of the expected outputs of notebooks.
  9. Formatting notebooks and using alternative formats, such as .py files with annotations, can help improve the maintainability and interoperability of notebooks in production.
  10. Scheduling notebooks can be done using tools such as Papermill or a scheduler that supports bash commands.