We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Data Streaming? I don't even know her by Julien Contarin
Learn how data streaming powers modern applications, from Apache Kafka basics to emerging standards. Discover best practices for real-time data processing & architecture.
-
Apache Kafka remains the core open standard for event streaming, powering most modern event-driven applications
-
Key components of modern data streaming architecture:
- Stream: Real-time messaging and data transport
- Connect: Integration with databases, SaaS solutions and other systems
- Govern: Schema management, security, lineage tracking
- Process: Data transformation and enrichment
-
Storage costs have decreased significantly in cloud environments, but compute remains expensive - focus should be on optimizing compute usage
-
Shift-left approach recommended for data processing - handle transformations upstream closer to data production rather than downstream
-
Data products should be:
- Discoverable through catalogs
- Schema-governed
- Producer-owned
- Available to consumers in real-time
- Secured and properly governed
-
Modern data architecture considerations:
- Multi-tenancy support
- Quota management
- Cost optimization through elastic scaling
- Integration with analytical and operational systems
- Support for both real-time and batch processing
-
Emerging standards and technologies:
- Apache Iceberg for table formats
- Apache Flink for stream processing
- Kafka Connect for standardized integrations
- KRAFT replacing ZooKeeper
-
Focus shifting from just analytical data products to universal data products that serve both operational and analytical needs
-
Cloud-native services should provide:
- Automatic scaling
- Cost-effective resource utilization
- Managed infrastructure
- Built-in high availability
-
Data streaming is becoming foundational for modern use cases including real-time analytics, AI/ML, and operational applications