The Science of Signals: Mastering Telemetry for Observability by Alex Van Boxel, Maximilien Richer

Learn best practices for scalable observability using OpenTelemetry, from proper instrumentation and sampling to cost management and effective alerting strategies.

Key takeaways

Cardinality is crucial for metrics - too many unique combinations of labels/attributes can overwhelm backend systems and increase costs dramatically
Focus instrumentation on what provides actual business value - avoid collecting unnecessary metrics, logs and traces that don’t help troubleshoot issues or understand system behavior
OpenTelemetry provides standardization across observability signals (metrics, logs, traces) through semantic conventions and consistent data types
Sampling strategies are essential for traces - collecting every trace in production is often impractical due to volume, use head-based or tail-based sampling approaches
Structure logs properly - avoid putting large stack traces directly in logs, use trace IDs to correlate, and maintain consistent formats that can be parsed
Consider costs at scale - observability data volume grows dramatically with service count and traffic, requiring careful planning around retention, sampling and aggregation
Start with auto-instrumentation but supplement with manual instrumentation for business-critical paths and custom requirements
Use the OpenTelemetry Collector to decouple instrumentation from backends and provide buffering, filtering and routing capabilities
Implement proper SLOs (Service Level Objectives) to determine what actually requires alerting vs what can be tracked in reports
Dashboard and alert carefully - too many alerts leads to alert fatigue, focus on actionable warnings that indicate real issues needing human intervention

The Science of Signals: Mastering Telemetry for Observability by Alex Van Boxel, Maximilien Richer

More talks