Discrimination is Wrong: Improving Productivity • Edward Kmett • YOW! 2015

Learn how discrimination sorts data in O(n) time instead of O(n log n), using discriminators and category theory concepts to boost productivity in operations like sorting and joins.

Key takeaways
  • Discrimination is a technique that allows sorting and grouping data in linear O(n) time, breaking the typical O(n log n) sorting bound by not doing pairwise comparisons

  • Key benefits include building data structures like Map and Set in linear time instead of O(n log n), and performing fast table joins and grouping operations

  • The technique uses “discriminators” that break data into equivalence classes based on their keys, rather than comparing elements pairwise

  • American Flag Sort and Radix Sort are implementations of discrimination techniques that work top-down and bottom-up respectively to achieve linear time sorting

  • The concept builds on category theory ideas like monoids, functors, and natural transformations to create a robust theoretical foundation

  • The implementation uses unsafe IO operations and custom foreign prims in Haskell to achieve the required performance characteristics

  • Law requirements include maintaining stability (preserving original order within equivalence classes) and proper handling of sums/products in algebraic data types

  • Provides potential for streaming/productive implementations where results can start being produced before seeing entire input

  • Allows operations like nub (removing duplicates) to work in O(n) time instead of O(n²)

  • Originally developed by Fritz Henglein at the University of Copenhagen through several papers between 2007-2011