"Hodor: Detecting and Addressing Overload in LinkedIn Microservices" by Bryan Barkley

Detecting and addressing overload in LinkedIn's microservices with Hodor, a monitoring framework that detects overload early, gradually sheds traffic, and adapts to changing traffic patterns.

Key takeaways

Overload detection and remediation are crucial for microservices, as they can quickly become overwhelmed and lead to cascading failures.
Hodor is a monitoring framework developed by LinkedIn to detect and address overload in microservices.
Design principles of Hodor include detecting overload early, conservatively signaling overload, and shedding traffic progressively.
Hodor has three main components: overload detectors, load shedding strategy, and data analysis.
Overload detectors include heartbeat, garbage collection, and thread pool detectors, which monitor specific metrics to detect overload.
Load shedding strategy involves gradually shedding traffic to prevent cascading failures and prevent retry storms.
Data analysis involves collecting and analyzing metrics to refine overload detection and improve load shedding strategies.
Hodor has been deployed to close to a thousand services in production, with no measurable overhead.
Hodor is designed to be extensible and modular, allowing for easy addition of new detectors and integration with existing systems.
The framework is also designed to be self-healing, allowing it to adapt to changing traffic patterns and service behavior.
Future plans for Hodor include adding additional detectors and improving data analysis capabilities.

"Hodor: Detecting and Addressing Overload in LinkedIn Microservices" by Bryan Barkley

More talks