We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Arthur Andres - Unified batch and stream processing in python | PyData Global 2023
Discover Beavers, a Python library for unified batch and stream processing, and learn how it simplifies data flow orchestration with reusable code, flexible data representation, and zero-copy concatenation.
- Batch and stream processing in Python can be cumbersome to manage.
- Existing libraries for streaming are not suitable for many industries.
- Beavers is a Python library that allows for unified batch and stream processing.
- Beavers allows for reusable code for both batch and stream jobs.
- Beavers uses Kafka as a message broker and Arrow for in-memory data representation.
- Beavers can be used to create a DAG (directed acyclic graph) to orchestrate data flow.
- Beavers includes stream nodes, computational nodes, and sync nodes.
- Stream nodes compute ephemeral events, computational nodes execute Python functions, and sync nodes provide output.
- Beavers can be used to replay messages from Kafka topics in the correct order.
- Beavers allows for flexible data representation and can be used with various serialization formats.
- Beavers provides a type-safe environment and fast zero-copy concatenation of tables.
- Beavers can be used to integrate with existing data sources and systems.