We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Rob Romijnders - Differential Privacy Made Practical | PyData Amsterdam 2024
Learn how differential privacy protects individual data while enabling machine learning - a practical guide to implementing privacy-preserving data science with Python
-
Differential privacy aims to protect individual data while allowing collective learning by adding controlled noise to results
-
Epsilon (ε) is the key privacy parameter:
- ε=1 is considered the “golden standard” for good privacy protection
- ε=3 or higher provides weak privacy guarantees
- Lower epsilon means stronger privacy but more noise/reduced utility
-
Laplace distribution is commonly used for adding noise because it provides mathematical privacy guarantees and scales well with sensitivity
-
Key applications include:
- Contact tracing apps
- Deep learning models
- LLM fine-tuning
- Census data
- Medical records
-
Privacy budget concept:
- Each query uses up some of the privacy budget
- Multiple queries require dividing budget across operations
- Pre-training on public data helps preserve budget for private fine-tuning
-
Trade-offs exist between:
- Privacy protection vs utility/accuracy
- Data set size vs amount of noise needed
- Number of queries vs privacy preservation
-
Simple anonymization or k-anonymity (only allowing queries on groups >50) is not sufficient for privacy protection
-
Practical implementations exist in:
- TensorFlow Privacy
- PyTorch Privacy
- Android telemetry
- Apple QuickType
- Government census data
-
Composition theorems help manage privacy budgets across multiple operations but can significantly reduce model utility
-
Empirical privacy protection is often stronger than theoretical bounds, but theoretical guarantees are still important