Understanding Probabilistic Data Structures with 112,092 UFO Sightings - Guy Royse - NDC London 2023

Explore probabilistic data structures for accuracy and performance trade-offs in real-time problem-solving with UFO sighting examples.

Key takeaways

Probabilistic data structures are not accurate, but can be fast and space-efficient.
Bloom filters are cool because they can give you a fast “probable” answer.
You can add items to Bloom filters, and then ask if an item is in the filter without checking all the items.
If an item is not in the filter, it’s definitely not in the filter.
If an item is in the filter, it’s probably in the filter.
Bloom filters are O(1) for reads and O(n) for updates.
Redis has Bloom filters, as well as other probabilistic data structures like CountMinSketch and HyperLogLog.
CountMinSketch is good for counting frequent items.
HyperLogLog is good for counting unique items.
MinHash is a neat tool for finding similar sets of items.
Similarity can be calculated using Jaccard similarity and union and intersection operations.
You can use probabilistic data structures to track trending topics, count unique users, and more.
Deterministic data structures are accurate but slow.
Probabilistic data structures are fast, but may not always be accurate.
The author’s favorite photo in the talk is a photo of a UFO sighting.

More talks