Talks - Vikram Waradpande: You've got trust issues, we've got solutions: Differential Privacy

Learn how Differential Privacy enables population data analysis while protecting individual privacy through noise injection, epsilon budgets, and the PyDP library.

Key takeaways
  • Differential Privacy allows analyzing population-level data while protecting individual privacy by adding controlled noise to query results

  • Key mechanisms include Laplacian, Gaussian and exponential noise distribution, with noise proportional to data sensitivity and inversely proportional to privacy budget (epsilon)

  • Epsilon parameter controls privacy-utility tradeoff - smaller epsilon means more privacy but less accuracy, typical range is 0.1-5

  • Simple data anonymization (removing names/SSNs) is insufficient due to linkage attacks using auxiliary datasets

  • PyDP library provides Python implementation of differential privacy algorithms including:

    • Bounded mean/sum calculations
    • Support for incremental computation on large datasets
    • Machine learning algorithm integration
    • Multiple noise mechanisms
  • Important considerations when implementing:

    • Understanding data sensitivity
    • Choosing appropriate epsilon values
    • Selecting right DP algorithm for use case
    • Evaluating accuracy requirements
    • Memory constraints with large datasets
  • Not suitable for:

    • Individual-level analysis
    • Fraud detection
    • Cases requiring exact answers
    • Very small datasets where noise would be too large
  • Two main approaches:

    • Local DP: noise added before data storage
    • Global DP: centralized trusted database adds noise to query results