Introduction to Real-Time Analytics with Apache Pinot • Tim Berglund • GOTO 2023

Explore the world of real-time analytics with Apache Pinot, designed for high-volume data processing and fresh insights. Discover the key characteristics, features, and use cases that make it an ideal solution for real-time event processing and analysis.

Key takeaways
  • Real-time analytics requires a fundamental understanding of what has happened, rather than just aggregating data.
  • Apache Pinot is designed for real-time analytics, with characteristics like freshness, concurrency, and latency.
  • Pinot is not intended to provide faster dashboards, but rather to process high-volume data in real-time.
  • Data storage requirements for real-time analytics are different from those for batch processing.
  • Pinot’s tiered storage feature allows for efficient storage and querying of large datasets.
  • Segmenting data based on specific criteria enables fast query performance.
  • Pinot uses a distributed architecture, with brokers and clients that communicate using topics.
  • Pinot is queryable, allowing for real-time computation and analysis.
  • Real-time analytics requires specialized observability solutions.
  • Pinot is used by companies like Uber for real-time event processing and analysis.
  • The importance of queries is in the filtering, sorting, grouping, and aggregation of data.
  • Real-time analytics requires a distributed database that can handle high volume and varying query patterns.
  • Pinot’s architecture includes a broker, a controller, and servers, which work together to provide real-time analytics.
  • Pinot’s about page notes that it uses Apache Zookeeper for persistence and metadata.