Onyx: Functional, Distributed Data Processing for Clojure • Michael Drogalis • YOW! 2015

Learn how Onyx, a 100% Clojure distributed data processing platform, achieves high performance, fault tolerance, and flexible workflows through masterless architecture & DAGs

Key takeaways

Onyx is a masterless, cloud-based distributed data processing platform written in 100% Clojure that supports both batch and streaming workflows
Uses a log-centric design approach with peer-to-peer communication, where peers communicate directly through segments rather than through a broker
Features fault tolerance through automatic failure detection and recovery using a ring-based approach where peers watch each other
Uses only 20 bytes per message in memory through an efficient bit sequence and XOR-based tracking system for message processing
Provides a workflow-based programming model using directed acyclic graphs (DAGs) to represent data flow between tasks
Achieves high performance - benchmark tests showed processing of half a million messages per second across 5 AWS c4.xlarge instances
Employs generative testing extensively to ensure correctness and find edge cases in distributed scenarios
Includes built-in garbage collection, back pressure handling, and metrics/monitoring capabilities
Offers plugins for various data stores including Datomic, Kafka, and S3
Uses Zookeeper for coordination and strong consistency in distributed operations while maintaining a masterless architecture
Separates structural workflow definitions from task implementations using catalogs, allowing for flexible program composition

Onyx: Functional, Distributed Data Processing for Clojure • Michael Drogalis • YOW! 2015

More talks