Giuditta Parolini - The Hell, According to a Data Scientist | PyData Global 2023

Ai

Learn essential practices for ethical data science: from respecting human experiences behind data to proper documentation, standardization & responsible AI implementation.

Key takeaways
  • Data scientists must recognize that data represents real people and human experiences, even when reduced to numbers - there are “tears behind the data”

  • Proper metadata and documentation are critical - datasets without sufficient metadata become effectively unusable

  • Standardization is essential when working with data:

    • Use consistent country codes, currency codes, date/time formats
    • Follow ISO standards
    • Maintain proper CSV formatting
    • Leave missing values empty rather than using placeholders
  • Machine readability should be a non-negotiable requirement when working with data

  • APIs should not be treated as magical solutions - they require proper expertise in web development and data engineering

  • AI should be approached critically:

    • Avoid using AI indiscriminately without understanding the reasoning behind it
    • Ensure AI is environmentally, economically and socially sustainable
    • Recognize the fundamental differences between human intelligence and machine computation
  • Technical competence matters - people should not advocate for or implement solutions they don’t fully understand

  • Data work requires careful attention to context and human factors:

    • Consider the real-world implications of the data
    • Maintain interpersonal respect and understanding
    • Remember that statistics represent human experiences
  • Data cleaning and standardization problems could often be avoided by following established standards and best practices

  • There’s a moral and intellectual imperative to handle data responsibly and ethically