Bridging the Chasm Between Research & Software Development • Linda Stougaard Nielsen • GOTO 2022

Bridging the gap between research and software development, this talk explores the challenges and opportunities of collaboration between data scientists and software developers, and presents strategies for efficient and maintainable code reuse.

Key takeaways
  • Code is often duplicated, with production and research code being written separately.
  • Glue code separates concerns and connects different parts of the system.
  • Collaboration between data scientists and software developers is important for bridging the gap.
  • Researchers and engineers have different skill sets and approaches to problem-solving.
  • Code reuse is key to efficiency and reducing duplication.
  • Enforcing coding standards and best practices can help create maintainable code.
  • Data scientists and engineers should work together to design and implement code that is both efficient and maintainable.
  • Code organization and structure are important for understanding and maintaining complex systems.
  • Tools like MLflow can help simplify the process of writing and reusing code.
  • Code reuse can be achieved by creating reusable modules and APIs.
  • Training and experimentation code can be split into separate modules and run in parallel.
  • Clear documentation and well-structured code are important for understanding and maintaining complex systems.
  • Code reviews and testing are important for ensuring the quality of the code.
  • Data scientists and engineers should work together to implement solutions that meet the needs of both parties.
  • The machine learning lifecycle should be considered when implementing solutions.
  • Code reuse can be achieved by creating a pipeline that is reusable and scalable.
  • Collaboration between data scientists and engineers is important for solving complex problems.
  • The approach to coding should be tailored to the specific needs of the project.