Soham Butala- Prefect Workflows for Scaling Acoustic Fisheries Survey Pipelines | PyData Global 2023

Learn how EcoFlow uses Prefect to orchestrate and scale acoustic fisheries data pipelines, featuring modular design, distributed processing, and cloud integration capabilities.

Key takeaways
  • EcoFlow is a Python package using Prefect for orchestrating and scaling acoustic fisheries survey data processing pipelines

  • Key components include EcoPipe for raw data processing, EcoShader for visualization, and EcoRegions for geospatial analysis

  • Configuration is managed through two YAML files:

    • Data source configuration
    • Processing steps and workflow definitions
  • System offers flexible deployment options:

    • Local machine execution
    • Cloud platforms (AWS, Azure, GCP)
    • Docker containers
    • EC2 instances
  • Built on Prefect architecture with:

    • Flows and tasks for pipeline organization
    • Storage blocks for code storage
    • Infrastructure blocks for execution environment
    • REST API support
  • Features distributed processing capabilities using Dask and Ray libraries

  • Handles large-scale sonar data processing (250+ terabytes) with parallel execution

  • Provides modular design allowing custom processing stages and workflow modifications

  • Includes built-in monitoring and logging capabilities through Prefect UI dashboard

  • Supports multiple input/output formats and integrates with various cloud services like AWS, Snowflake, and Databricks

  • Error handling and validation systems are centralized for better pipeline management