BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Netflix Hystrix - Latency and Fault Tolerance for Complex Distributed Systems

by Bienvenido David on Dec 21, 2012 |

Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure. Hystrix features thread and semaphore isolation with fallbacks and circuit breakers, request caching and request collapsing, and monitoring and configuration. Hystrix has evolved out of the resilience engineering work that the Netflix API team began in 2011, and now processes tens of billions of thread-isolated, and hundreds of billions of semaphore-isolated calls at Netflix daily. Hystrix is an open source library hosted on GitHub under Apache License 2.0.

Complex distributed architectures have many dependencies. If the application is not isolated from dependency failures, the application itself is at risk of being taken down. On a high volume website, a single back-end dependency becoming latent can cause all application resources to become saturated in a matter of seconds.

Hystrix helps by providing protection and control over latency and failure from dependencies, most commonly those accessed over network. It helps stop cascading failures and allows you to fail fast and rapidly recover, or fallback and gracefully degrade.

Here's how it works. You will have to wrap calls to dependencies in a HystrixCommand object. HystrixCommand follows the command pattern and typically executes within a separate thread. A time-out occurs when the call takes longer than defined thresholds. Hystrix maintains a thread-pool (or semaphore) for each dependency and rejects requests (instead of queuing requests) if the thread-pool becomes exhausted. It provides circuit-breaker functionality that can stop all requests to a dependency. You can also implement fallback logic when a request fails, is rejected, timed-out or short-circuited. Hystrix also supports request caching and request collapsing.

Here's a simple Hello World implementation of a HystrixCommand.

public class HelloWorldCommand extends HystrixCommand { 	public HelloWorldCommand() { 		super(HystrixCommandGroupKey.Factory.asKey("MyGroup")); 	}  	@Override 	protected String run() { 		return "Hello World"; 	}  	@Override 	protected String getFallback() { 		return "Hello Fallback"; 	} } 

The group key is used for grouping commands for reporting and alerting purposes. Graceful degradation can be achieved by adding a getFallback() implementation, which gets executed for all types of failures such as exceptions, time-outs, thread pool (or semaphore) rejection and circuit-breaker short-circuiting. Hystrix commands can be executed synchronously with the execute() method.

String s = new HelloWorldCommand().execute();

Hystrix commands can also be executed asynchronously using the queue() method.

java.util.concurrent.Future future = new HelloWorldCommand().queue(); String s = future.get(); 

Hystrix uses the bulkhead pattern to isolate dependencies and limit concurrent access. Separate thread pools are used per dependency so that concurrent requests are constrained. Latency on the underlying executions will saturate the available threads only in that pool. Using semaphores instead of thread-pools is also an option, which allows for load shedding, but not time-outs. For an in-depth discussion about the pros and cons of using thread pools for dependencies, read the How Hystrix Isolation Works.

Hystrix comes with a monitoring dashboard, the same one used at Netflix. The Hystrix Dashboard provides near real-time monitoring, alerting and operational control. It displays success, failures (exceptions thrown by client), timeouts, and thread rejections. Users can make configuration changes on the fly, like manually short circuiting a dependency.

To get started, visit the Hystrix documentation at http://github.com/Netflix/Hystrix/wiki, which includes Getting Started and How To Use. You will need Java 6 or later. Maven users should look for the Maven artifact com.netflix.hystrix hystrix-core. For more information, read the Netflix API Performance and Fault Tolerance presentation and the official Hystrix FAQ. Note that support for asynchronous dependencies hasn't been implemented as of this writing.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT