Martin Durant - Intake 2 | PyData Global 2023

Learn about Intake 2, a major rewrite focused on simplified data catalog management. Discover key features like auto-detection, framework integration, and unified data access.

Key takeaways
  • Intake 2 is a major rewrite and experimental new version focused on simplifying data catalog management and access

  • Key capabilities:

    • Creates data catalogs without requiring server infrastructure
    • Automatically detects and guesses appropriate data readers
    • Integrates with multiple frameworks (Pandas, Dask, Ray, etc.)
    • Enables conversion between different data formats and APIs
    • Supports metadata, documentation, and data discovery
  • Core design principles:

    • Minimal setup requirements
    • Framework-agnostic approach
    • Simple reader definitions
    • No built-in authentication/security
    • Focus on open source integration
  • Features:

    • Lazy loading of data
    • Pipeline creation and transformation capabilities
    • Multiple output formats
    • Catalog search functionality
    • Templating support
  • Benefits for users:

    • Reduces boilerplate code
    • Simplifies data access patterns
    • Enables sharing of data catalogs
    • Provides unified interface across different data sources
    • Makes data discovery easier
  • Current status:

    • Alpha version available for testing
    • Looking for community feedback and involvement
    • More readers and converters being added
    • Maintains compatibility with Intake v1 features
    • Integration possibilities with commercial catalogs (Databricks, etc.)