We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Onyx: Functional, Distributed Data Processing for Clojure • Michael Drogalis • YOW! 2015
Learn how Onyx, a 100% Clojure distributed data processing platform, achieves high performance, fault tolerance, and flexible workflows through masterless architecture & DAGs
-
Onyx is a masterless, cloud-based distributed data processing platform written in 100% Clojure that supports both batch and streaming workflows
-
Uses a log-centric design approach with peer-to-peer communication, where peers communicate directly through segments rather than through a broker
-
Features fault tolerance through automatic failure detection and recovery using a ring-based approach where peers watch each other
-
Uses only 20 bytes per message in memory through an efficient bit sequence and XOR-based tracking system for message processing
-
Provides a workflow-based programming model using directed acyclic graphs (DAGs) to represent data flow between tasks
-
Achieves high performance - benchmark tests showed processing of half a million messages per second across 5 AWS c4.xlarge instances
-
Employs generative testing extensively to ensure correctness and find edge cases in distributed scenarios
-
Includes built-in garbage collection, back pressure handling, and metrics/monitoring capabilities
-
Offers plugins for various data stores including Datomic, Kafka, and S3
-
Uses Zookeeper for coordination and strong consistency in distributed operations while maintaining a masterless architecture
-
Separates structural workflow definitions from task implementations using catalogs, allowing for flexible program composition