Casari, Cruz, & Vargas - Data Tales from an Open Source Research Team | PyData Global 2023

Learn how an open source research team combines data science, community insights, and human context to measure success beyond metrics and drive meaningful outcomes.

Key takeaways
  • Data should serve to understand people, technology and ideas - it’s not just about collecting metrics but understanding the context and human elements behind them

  • Just because data is easily available doesn’t make it the right data source - carefully evaluate if the data actually answers the key questions and provides meaningful insights

  • Bot activity can significantly impact metrics - on GitHub, less than 1% of actors (bots) generate up to 24% of pull request events, requiring careful filtering and analysis

  • Focus on outcomes over outputs - measuring raw metrics isn’t enough, need to understand what success looks like and tie data to meaningful business/community outcomes

  • Social media and sentiment analysis have limitations - tools may not catch nuanced context or be appropriate for all use cases, especially around sensitive topics or early warning signals

  • Change management requires understanding user needs - data should inform how to make changes convenient and valuable for users, as changing habits is difficult

  • Present alternatives when data shows current approach isn’t working - don’t just say no, provide other paths forward with supporting evidence

  • Consider multiple data sources - platforms like GitHub don’t capture all open source activity, combine with other community spaces for comprehensive understanding

  • Account for automation in metrics - understand how your organization uses automation tools and how they impact your measurements

  • Tell the story behind the data - raw numbers need context and narrative to drive understanding and decision making effectively