Yahoo! Open Sources Storm on Hadoop

| by Boris Lublinsky Follow 1 Followers on Jun 17, 2013. Estimated reading time: 1 minute |

Apache Hadoop is de facto standard for Big Data storage and batch processing, while Tweeter Storm is quickly becoming a standard for large-scale event processing implementations. Unfortunately, up until recently, Storm and Hadoop required two physically different clusters for their implementation. Last week Yahoo! announced open sourcing Storm running on a Hadoop cluster.

According to Yahoo!, collocating real-time processing (Storm) with batch processing offers a number of advantages over segregated clusters:

  • It provides a huge potential for elasticity. Real-time processing will rarely produce a constant and predictable load. As such, Storm needs more resources to keep up with spikes in demand. Collocating Storm with batch processing allows Storm to steal resources from batch jobs when needed and give them back when demand subsides. The Storm-YARN effort lays the groundwork to make this possible.
  • Many applications use Storm for low-latency processing and Map/Reduce for batch processing while sharing data between Storm and Map/Reduce. By placing Storm physically closer to the data source and/or other components in the same pipeline we can reduce network transfers and in turn the total cost of acquiring the data.

The Storm-Hadoop integration is leveraging Hadoop’s new resource manager YARN

Storm-on-YARN enables Storm applications to utilize the computational resources [of] tens of thousands of Hadoop computation nodes. YARN is used to launch the Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).

Storm-YARN integration provides a standard Storm configuration file including YARN specific parameters, enabling the configuration of the initial number of supervisors to be launched and the memory size of the container to be allocated for each supervisor.

In addition, Yahoo! has enhanced Storm to support Hadoop style security mechanisms, which enables Storm applications to directly access Hadoop data stored on both HDFS and HBase.

According to Loraine Lawson:

One of the more promising applications of Hadoop and other Big Data solutions is the ability to deliver information in real time. It’s not mentioned often, which is unfortunate, because it’s really a game-changer for some organizations, with far-reaching implications for the rest of us.

The integration of real-time event processing implemented by Storm plus Hadoop along with real-time Hadoop queries brings us one step closer to that vision.


Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Spelling by justin holmes

Should it not be Twitter instead of Tweeter

question by 张 松伟

” physically different clusters“ ? can anyone give me a detail explaination?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you