Onyx: Functional, Distributed Data Processing for Clojure • Michael Drogalis • YOW! 2015

Learn how Onyx, a 100% Clojure distributed data processing platform, achieves high performance, fault tolerance, and flexible workflows through masterless architecture & DAGs

Key takeaways
  • Onyx is a masterless, cloud-based distributed data processing platform written in 100% Clojure that supports both batch and streaming workflows

  • Uses a log-centric design approach with peer-to-peer communication, where peers communicate directly through segments rather than through a broker

  • Features fault tolerance through automatic failure detection and recovery using a ring-based approach where peers watch each other

  • Uses only 20 bytes per message in memory through an efficient bit sequence and XOR-based tracking system for message processing

  • Provides a workflow-based programming model using directed acyclic graphs (DAGs) to represent data flow between tasks

  • Achieves high performance - benchmark tests showed processing of half a million messages per second across 5 AWS c4.xlarge instances

  • Employs generative testing extensively to ensure correctness and find edge cases in distributed scenarios

  • Includes built-in garbage collection, back pressure handling, and metrics/monitoring capabilities

  • Offers plugins for various data stores including Datomic, Kafka, and S3

  • Uses Zookeeper for coordination and strong consistency in distributed operations while maintaining a masterless architecture

  • Separates structural workflow definitions from task implementations using catalogs, allowing for flexible program composition