Probabilistic data structures by Andrea Iacono

Discover the power of probabilistic data structures, including Bloom Filters, Counting Bloom Filters, Cuckoo Filters, and Hyper Log Log, and learn how they can be used in applications like fraud detection and membership checking.

Key takeaways

Probabilistic Data Structures

Bloom Filter: A probabilistic data structure that can check if an item is in a set by mapping it to a bit array, but may return false positives
Counting Bloom Filter: An improvement over Bloom Filter that uses a count array instead of a bit array, allowing for deletions
Cuckoo Filter: A data structure that uses cuckoo hashing to map items to an array, allowing for fast lookup and deletion
Hyper Log Log (HLL): A data structure used to estimate the cardinality of a set
Applications: Probabilistic data structures can be used in various applications, including fraud detection, membership checking, and caching
Properties: They can provide fast lookup, deletion, and insertion, but may return false positives and false negatives
Trade-offs: There are trade-offs between memory usage and false positive rate
Practical Implementation: There are libraries available that implement these data structures, such as Asher in Java
Example: In a project related to oil refinery, Bloom Filter was used to check if a specific item belonged to a set
Efficiency: Cuckoo Filter is very efficient, using around 1.2 megabytes of memory
Correctness: Probabilistic data structures do not guarantee correctness, but rather provide an estimation of the probability of an event
Cardinality Estimation: Hyper Log Log can be used to estimate the cardinality of a set
Heuristics: Heuristics can be used to improve the accuracy of cardinality estimation
Limitations: Probabilistic data structures may not be suitable for all applications, and their limitations should be carefully considered
Best Practices: It’s important to choose the right data structure and trade-off between memory usage and false positive rate based on the specific use case

Probabilistic data structures by Andrea Iacono

Probabilistic Data Structures

More talks