Pryce Turner - Orchestrating Bioinformatics Workflows Across a Heterogeneous Toolset with Flyte

Learn how Flyte enables robust bioinformatics workflows by orchestrating diverse tools and languages, with features like strong typing, versioning, and automated scaling.

Key takeaways
  • Flyte provides orchestration for bioinformatics workflows across multiple programming languages (Python, Java, C++, R) through a common layer and containerization

  • Key features include:

    • Strong typing for inputs/outputs
    • Version control for workflows and tasks
    • First-class data flow management
    • Granular caching mechanisms
    • Built-in parallelization via map tasks and dynamic workflows
  • Handles complex bioinformatics requirements:

    • Large data volumes and file management
    • Traditional filesystem dependencies
    • Multiple tool dependencies across languages
    • Resource intensive computations
    • Quality control and alignment workflows
  • Infrastructure benefits:

    • Runs on vanilla Kubernetes clusters
    • Containerized execution with isolated environments
    • Object store integration for data persistence
    • GPU/accelerator support
    • Automatic dependency management via image specs
  • Development workflow improvements:

    • Local testing capabilities
    • Fast failure detection
    • Reusable task components
    • Visual workflow monitoring
    • Compile-time guarantees
    • Automated retry mechanisms for failed steps
  • Support for multiple task types:

    • Python tasks
    • Shell tasks
    • Container tasks
    • Map/reduce operations with failure thresholds