Talks - Vikram Waradpande: You've got trust issues, we've got solutions: Differential Privacy

Python Security

Learn how Differential Privacy enables population data analysis while protecting individual privacy through noise injection, epsilon budgets, and the PyDP library.

Key takeaways

Differential Privacy allows analyzing population-level data while protecting individual privacy by adding controlled noise to query results
Key mechanisms include Laplacian, Gaussian and exponential noise distribution, with noise proportional to data sensitivity and inversely proportional to privacy budget (epsilon)
Epsilon parameter controls privacy-utility tradeoff - smaller epsilon means more privacy but less accuracy, typical range is 0.1-5
Simple data anonymization (removing names/SSNs) is insufficient due to linkage attacks using auxiliary datasets
PyDP library provides Python implementation of differential privacy algorithms including:
- Bounded mean/sum calculations
- Support for incremental computation on large datasets
- Machine learning algorithm integration
- Multiple noise mechanisms
Important considerations when implementing:
- Understanding data sensitivity
- Choosing appropriate epsilon values
- Selecting right DP algorithm for use case
- Evaluating accuracy requirements
- Memory constraints with large datasets
Not suitable for:
- Individual-level analysis
- Fraud detection
- Cases requiring exact answers
- Very small datasets where noise would be too large
Two main approaches:
- Local DP: noise added before data storage
- Global DP: centralized trusted database adds noise to query results

Talks - Vikram Waradpande: You've got trust issues, we've got solutions: Differential Privacy

More talks