We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Data Streaming? I don't even know her by Julien Contarin
Learn how data streaming powers modern applications, from Apache Kafka basics to emerging standards. Discover best practices for real-time data processing & architecture.
- 
    
Apache Kafka remains the core open standard for event streaming, powering most modern event-driven applications
 - 
    
Key components of modern data streaming architecture:
- Stream: Real-time messaging and data transport
 - Connect: Integration with databases, SaaS solutions and other systems
 - Govern: Schema management, security, lineage tracking
 - Process: Data transformation and enrichment
 
 - 
    
Storage costs have decreased significantly in cloud environments, but compute remains expensive - focus should be on optimizing compute usage
 - 
    
Shift-left approach recommended for data processing - handle transformations upstream closer to data production rather than downstream
 - 
    
Data products should be:
- Discoverable through catalogs
 - Schema-governed
 - Producer-owned
 - Available to consumers in real-time
 - Secured and properly governed
 
 - 
    
Modern data architecture considerations:
- Multi-tenancy support
 - Quota management
 - Cost optimization through elastic scaling
 - Integration with analytical and operational systems
 - Support for both real-time and batch processing
 
 - 
    
Emerging standards and technologies:
- Apache Iceberg for table formats
 - Apache Flink for stream processing
 - Kafka Connect for standardized integrations
 - KRAFT replacing ZooKeeper
 
 - 
    
Focus shifting from just analytical data products to universal data products that serve both operational and analytical needs
 - 
    
Cloud-native services should provide:
- Automatic scaling
 - Cost-effective resource utilization
 - Managed infrastructure
 - Built-in high availability
 
 - 
    
Data streaming is becoming foundational for modern use cases including real-time analytics, AI/ML, and operational applications