Discrimination is Wrong: Improving Productivity • Edward Kmett • YOW! 2015

Learn how discrimination sorts data in O(n) time instead of O(n log n), using discriminators and category theory concepts to boost productivity in operations like sorting and joins.

Key takeaways

Discrimination is a technique that allows sorting and grouping data in linear O(n) time, breaking the typical O(n log n) sorting bound by not doing pairwise comparisons
Key benefits include building data structures like Map and Set in linear time instead of O(n log n), and performing fast table joins and grouping operations
The technique uses “discriminators” that break data into equivalence classes based on their keys, rather than comparing elements pairwise
American Flag Sort and Radix Sort are implementations of discrimination techniques that work top-down and bottom-up respectively to achieve linear time sorting
The concept builds on category theory ideas like monoids, functors, and natural transformations to create a robust theoretical foundation
The implementation uses unsafe IO operations and custom foreign prims in Haskell to achieve the required performance characteristics
Law requirements include maintaining stability (preserving original order within equivalence classes) and proper handling of sums/products in algebraic data types
Provides potential for streaming/productive implementations where results can start being produced before seeing entire input
Allows operations like nub (removing duplicates) to work in O(n) time instead of O(n²)
Originally developed by Fritz Henglein at the University of Copenhagen through several papers between 2007-2011

Discrimination is Wrong: Improving Productivity • Edward Kmett • YOW! 2015

More talks