BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Yahoo! Open Sources Storm on Hadoop

by Boris Lublinsky on Jun 17, 2013 |

Apache Hadoop is de facto standard for Big Data storage and batch processing, while Tweeter Storm is quickly becoming a standard for large-scale event processing implementations. Unfortunately, up until recently, Storm and Hadoop required two physically different clusters for their implementation. Last week Yahoo! announced open sourcing Storm running on a Hadoop cluster.

According to Yahoo!, collocating real-time processing (Storm) with batch processing offers a number of advantages over segregated clusters:

  • It provides a huge potential for elasticity. Real-time processing will rarely produce a constant and predictable load. As such, Storm needs more resources to keep up with spikes in demand. Collocating Storm with batch processing allows Storm to steal resources from batch jobs when needed and give them back when demand subsides. The Storm-YARN effort lays the groundwork to make this possible.
  • Many applications use Storm for low-latency processing and Map/Reduce for batch processing while sharing data between Storm and Map/Reduce. By placing Storm physically closer to the data source and/or other components in the same pipeline we can reduce network transfers and in turn the total cost of acquiring the data.

The Storm-Hadoop integration is leveraging Hadoop’s new resource manager YARN

Storm-on-YARN enables Storm applications to utilize the computational resources [of] tens of thousands of Hadoop computation nodes. YARN is used to launch the Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).

Storm-YARN integration provides a standard Storm configuration file including YARN specific parameters, enabling the configuration of the initial number of supervisors to be launched and the memory size of the container to be allocated for each supervisor.

In addition, Yahoo! has enhanced Storm to support Hadoop style security mechanisms, which enables Storm applications to directly access Hadoop data stored on both HDFS and HBase.

According to Loraine Lawson:

One of the more promising applications of Hadoop and other Big Data solutions is the ability to deliver information in real time. It’s not mentioned often, which is unfortunate, because it’s really a game-changer for some organizations, with far-reaching implications for the rest of us.

The integration of real-time event processing implemented by Storm plus Hadoop along with real-time Hadoop queries brings us one step closer to that vision.

 

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Spelling by justin holmes

Should it not be Twitter instead of Tweeter

question by 张 松伟

” physically different clusters“ ? can anyone give me a detail explaination?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT