We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Rareş Muşină – Resilient service-to-service calls in a post-Hystrix world
Rares Musina shares insights on resilient service-to-service calls in a post-Hystrix world, exploring alternatives like Resilience4j, Sentry, gRPC, and Envoy, emphasizing observability, automation, and discipline in achieving reliability and adaptability.
- Rares Musina’s presentation on resilient service-to-service calls in a post-Hystrix world.
- Historically, service providers had resorted to Hystrix circuit breakers, but due to its limitations, users are now seeking alternatives.
- Resilience4j is a new alternative, which provides a more straightforward, easy-to-use API, and is language-agnostic.
- Sentry is a service that helps prevent and detect common errors, such as node connection failures, and provides metrics to aid in troubleshooting.
- Netflix uses gRPC to build resilience features, but it’s still in its early stages.
- One of the challenges with resilience is dealing with service providers throttling traffic, which can lead to unhappy users.
- Observability is crucial for understanding the performance of a service and identifying areas for improvement.
- To deal with sudden spikes in traffic, services need to be designed to handle bursts of requests, and not just average traffic.
- Resilience requires discipline in setting timeouts and falback strategies.
- Multiple teams may be involved in ensuring resilience, including DevOps, SRE, and backend teams.
- Envoy is a service proxy that can be used to enforce resilience features, such as timeouts and circuit breakers.
- Automation and observability are key to ensuring resilience in distributed systems.
- When designing resilience, consider the use of idempotency to ensure correct behavior in the event of failures.
- Capacity planning is essential to ensure that services can handle sudden spikes in traffic.
- Below a certain threshold, services may not be able to handle additional requests, leading to resource starvation.
- When designing services, consider the use of retries, but be aware of the potential for retry storms.
- In the event of failures, services should aim to return a valid response to users, rather than simply failing.
- Resilience is not a one-time effort, but rather an ongoing process that requires continuous monitoring and improvement.