Jayce @ BETA - Big Data Engineering With Python and AWS | PyData Vermont 2024

Python

Learn how BETA Technologies processes vast amounts of aircraft data using Python and AWS. Explore their data lake architecture, processing patterns, and tools for handling 11GB/hour of flight data.

Key takeaways

Beta Technologies builds electric aircraft prototypes with extensive data collection capabilities - ~11GB/hour video and 570 million telemetry points/hour
Core data stack uses AWS services:
- S3 for data lake storage
- DynamoDB for metadata
- Redshift for data warehouse
- Fargate for container orchestration
- Managed Airflow for workflow orchestration
Data architecture follows medallion pattern:
- Bronze: raw ingested data
- Silver: processed queryable data
- Gold: transformed data ready for BI/dashboards
Uses micro-batch processing instead of Spark to handle time series data, avoiding complexity of distributed computing
All data is stored in “tall” format (timestamp, field, value) for flexibility and scalability
Grafana serves as primary BI tool, chosen for:
- Time series optimization
- Technical user base compatibility
- Interactive visualizations
- Video sync capabilities
Custom Python tooling handles data decoding and processing from multiple sources:
- Aircraft sensors
- Video feeds
- Simulation data
- Test environment data
Focus on enabling engineers rather than replacing them - ML models augment human expertise
Total data volume around 150 terabytes from two aircraft prototypes
System designed for extensibility with minimal downstream impact when adding new sensors or data sources

Jayce @ BETA - Big Data Engineering With Python and AWS | PyData Vermont 2024

More talks