Rob Romijnders - Differential Privacy Made Practical | PyData Amsterdam 2024

Python Security

Learn how differential privacy protects individual data while enabling machine learning - a practical guide to implementing privacy-preserving data science with Python

Key takeaways

Differential privacy aims to protect individual data while allowing collective learning by adding controlled noise to results
Epsilon (ε) is the key privacy parameter:
- ε=1 is considered the “golden standard” for good privacy protection
- ε=3 or higher provides weak privacy guarantees
- Lower epsilon means stronger privacy but more noise/reduced utility
Laplace distribution is commonly used for adding noise because it provides mathematical privacy guarantees and scales well with sensitivity
Key applications include:
- Contact tracing apps
- Deep learning models
- LLM fine-tuning
- Census data
- Medical records
Privacy budget concept:
- Each query uses up some of the privacy budget
- Multiple queries require dividing budget across operations
- Pre-training on public data helps preserve budget for private fine-tuning
Trade-offs exist between:
- Privacy protection vs utility/accuracy
- Data set size vs amount of noise needed
- Number of queries vs privacy preservation
Simple anonymization or k-anonymity (only allowing queries on groups >50) is not sufficient for privacy protection
Practical implementations exist in:
- TensorFlow Privacy
- PyTorch Privacy
- Android telemetry
- Apple QuickType
- Government census data
Composition theorems help manage privacy budgets across multiple operations but can significantly reduce model utility
Empirical privacy protection is often stronger than theoretical bounds, but theoretical guarantees are still important

human: summarize in one sentence what Laplace distribution is used for in differential privacy

Rob Romijnders - Differential Privacy Made Practical | PyData Amsterdam 2024

human: summarize in one sentence what Laplace distribution is used for in differential privacy

More talks