Hooncheol Shin The easiest way to collaborate on Jupyter | JupyterCon 2023

Discover the simplest way to collaborate on Jupyter notebooks with Link Git, enhancing reproducibility, conflict resolution, and standardization for data scientists and machine learning engineers.

Key takeaways
  • Jupyter is a widely used tool for data scientists, but its flexibility leads to diverse user behavior and makes it challenging to handle notebook differences caused by commits.
  • Version control of Jupyter files is difficult due to the JSON format and usage patterns.
  • Collaborating on Jupyter can be problematic, with issues like conflicts, difficulty in reproducing results, and lack of standardization.
  • Introducing pipelines in Jupyter can improve reproducibility and collaboration by assigning identities to cells and allowing for caching of execution results.
  • Link Git is a tool that helps resolve conflicts in the Jupyter editor and provides a way to collaborate seamlessly on Jupyter.
  • The easiest way to collaborate on Jupyter is by using Link Git, which offers features like pipeline building, caching, and remote execution.
  • Machine learning engineers also collaborate extensively on their code and results, and Link Git can help improve their workflow.
  • The goal is to contribute to a more collaborative environment on the Jupyter platform.
  • The speaker suggests using Jupyter only for prototyping machine learning projects, as it doesn’t provide a robust framework for improving code reusability and manageability.
  • The community has been experimenting with text-based notebooks for quite some time, and there is a pull request to enable a text-based format for Jupyter notebooks.
  • Link is a pipeline building tool that provides various features like remote execution, hyperparameter optimization, and parallel execution.
  • The speaker recommends installing Link with the pip install mrx-link command.
  • Link Git is available with the pip install mrx-link command and offers features like pipeline building, caching, and remote execution.
  • The speaker’s company, Makina Rocks, has developed ML products like Runway and Lync, and they are constantly thinking about how to improve collaboration with the realm of MLOps.
  • The speaker’s goal is to propose a way to collaborate seamlessly on Jupyter.