We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Jayce @ BETA - Big Data Engineering With Python and AWS | PyData Vermont 2024
Learn how BETA Technologies processes vast amounts of aircraft data using Python and AWS. Explore their data lake architecture, processing patterns, and tools for handling 11GB/hour of flight data.
-
Beta Technologies builds electric aircraft prototypes with extensive data collection capabilities - ~11GB/hour video and 570 million telemetry points/hour
-
Core data stack uses AWS services:
- S3 for data lake storage
- DynamoDB for metadata
- Redshift for data warehouse
- Fargate for container orchestration
- Managed Airflow for workflow orchestration
-
Data architecture follows medallion pattern:
- Bronze: raw ingested data
- Silver: processed queryable data
- Gold: transformed data ready for BI/dashboards
-
Uses micro-batch processing instead of Spark to handle time series data, avoiding complexity of distributed computing
-
All data is stored in “tall” format (timestamp, field, value) for flexibility and scalability
-
Grafana serves as primary BI tool, chosen for:
- Time series optimization
- Technical user base compatibility
- Interactive visualizations
- Video sync capabilities
-
Custom Python tooling handles data decoding and processing from multiple sources:
- Aircraft sensors
- Video feeds
- Simulation data
- Test environment data
-
Focus on enabling engineers rather than replacing them - ML models augment human expertise
-
Total data volume around 150 terabytes from two aircraft prototypes
-
System designed for extensibility with minimal downstream impact when adding new sensors or data sources