We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Soham Butala- Prefect Workflows for Scaling Acoustic Fisheries Survey Pipelines | PyData Global 2023
Learn how EcoFlow uses Prefect to orchestrate and scale acoustic fisheries data pipelines, featuring modular design, distributed processing, and cloud integration capabilities.
-
EcoFlow is a Python package using Prefect for orchestrating and scaling acoustic fisheries survey data processing pipelines
-
Key components include EcoPipe for raw data processing, EcoShader for visualization, and EcoRegions for geospatial analysis
-
Configuration is managed through two YAML files:
- Data source configuration
- Processing steps and workflow definitions
-
System offers flexible deployment options:
- Local machine execution
- Cloud platforms (AWS, Azure, GCP)
- Docker containers
- EC2 instances
-
Built on Prefect architecture with:
- Flows and tasks for pipeline organization
- Storage blocks for code storage
- Infrastructure blocks for execution environment
- REST API support
-
Features distributed processing capabilities using Dask and Ray libraries
-
Handles large-scale sonar data processing (250+ terabytes) with parallel execution
-
Provides modular design allowing custom processing stages and workflow modifications
-
Includes built-in monitoring and logging capabilities through Prefect UI dashboard
-
Supports multiple input/output formats and integrates with various cloud services like AWS, Snowflake, and Databricks
-
Error handling and validation systems are centralized for better pipeline management