The Science of Signals: Mastering Telemetry for Observability by Alex Van Boxel, Maximilien Richer

Learn best practices for scalable observability using OpenTelemetry, from proper instrumentation and sampling to cost management and effective alerting strategies.

Key takeaways
  • Cardinality is crucial for metrics - too many unique combinations of labels/attributes can overwhelm backend systems and increase costs dramatically

  • Focus instrumentation on what provides actual business value - avoid collecting unnecessary metrics, logs and traces that don’t help troubleshoot issues or understand system behavior

  • OpenTelemetry provides standardization across observability signals (metrics, logs, traces) through semantic conventions and consistent data types

  • Sampling strategies are essential for traces - collecting every trace in production is often impractical due to volume, use head-based or tail-based sampling approaches

  • Structure logs properly - avoid putting large stack traces directly in logs, use trace IDs to correlate, and maintain consistent formats that can be parsed

  • Consider costs at scale - observability data volume grows dramatically with service count and traffic, requiring careful planning around retention, sampling and aggregation

  • Start with auto-instrumentation but supplement with manual instrumentation for business-critical paths and custom requirements

  • Use the OpenTelemetry Collector to decouple instrumentation from backends and provide buffering, filtering and routing capabilities

  • Implement proper SLOs (Service Level Objectives) to determine what actually requires alerting vs what can be tracked in reports

  • Dashboard and alert carefully - too many alerts leads to alert fatigue, focus on actionable warnings that indicate real issues needing human intervention