We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Understanding Probabilistic Data Structures with 112,092 UFO Sightings - Guy Royse - NDC London 2023
Explore probabilistic data structures for accuracy and performance trade-offs in real-time problem-solving with UFO sighting examples.
- Probabilistic data structures are not accurate, but can be fast and space-efficient.
- Bloom filters are cool because they can give you a fast “probable” answer.
- You can add items to Bloom filters, and then ask if an item is in the filter without checking all the items.
- If an item is not in the filter, it’s definitely not in the filter.
- If an item is in the filter, it’s probably in the filter.
- Bloom filters are O(1) for reads and O(n) for updates.
- Redis has Bloom filters, as well as other probabilistic data structures like CountMinSketch and HyperLogLog.
- CountMinSketch is good for counting frequent items.
- HyperLogLog is good for counting unique items.
- MinHash is a neat tool for finding similar sets of items.
- Similarity can be calculated using Jaccard similarity and union and intersection operations.
- You can use probabilistic data structures to track trending topics, count unique users, and more.
- Deterministic data structures are accurate but slow.
- Probabilistic data structures are fast, but may not always be accurate.
- The author’s favorite photo in the talk is a photo of a UFO sighting.