BT
x Share your thoughts on trends and content!

Yahoo! Benchmarks Apache Flink, Spark and Storm

by on Dec 23, 2015 |

Yahoo! has benchmarked three of the main stream processing frameworks: Apache Flink, Spark and Storm.

For stream processing Yahoo! used in the past S4, a platform developed internally, but a decision was made in 2012 to replace it with Apache Storm. They are currently using Storm extensively for various data processing needs running it on ~2,300 nodes. But new data processing frameworks have taken the first stage lately and Yahoo! wanted to compare their performance against Storm’s, so they devised a benchmark published on GitHub.

In this benchmark, Yahoo! compared Apache Flink, Spark and Storm. The application tested is related to advertisement, having 100 campaigns and 10 ads per campaign. Five Kafka nodes are used to generate JSON events which are deserialized, passed through a filter, then the events are joined with their associated campaigns and stored into a Redis node. Kafka varied the number of events generated in 10 steps from 50K/s to 170K/s. The entire setup of the benchmark, including the hardware configuration used along with various observations on what was left out and what could be improved is presented in this Yahoo! Engineering post. To be able to compare the three frameworks, Yahoo! measured the percentile latency needed for a touple to be completely processed for each event emitting rate.

According to Yahoo!, both Flink and Storm showed similar behavior, the percentile latency varying linearly until the 99th percentile when the latency grows exponentially. Storm 0.10.0 could not process data starting at the event rate of 135K/s. Storm 0.11.0 with acking had serious trouble processing data at 150K/s. With acking disabled, Storm 0.11 performed much better, beating Flink, but Yahoo! acknowledges that “with acking disabled, the ability to report and handle tuple failures is disabled also.”

In Yahoo!’s tests, Spark’s results were considerably worse than Flink and Storm’s, going up to 70 sec without back-pressure and 120 sec with back-pressure, compared with less than 1 sec for Flink and Storm, as shown in the following chart (percentile latency depicted for the rate 150K/s):

yahoo-benchmark

Yahoo! concluded their benchmark noting the following:

The bottom line for us is Storm did great. Writing topologies is simple, and it’s easy to get low latency comparable to or better than Flink up to fairly high throughputs. Without acking, Storm even beat Flink at very high throughput, and we expect that with further optimizations like combining bolts, more intelligent routing of tuples, and improved acking, Storm with acking enabled would compete with Flink at very high throughput too.

The competition between near real time streaming systems is heating up, and there is no clear winner at this point. Each of the platforms studied here have their advantages and disadvantages. Performance is but one factor among others, such as security or integration with tools and libraries.

Yahoo! intends to improve the benchmark in the future to keep up with improvements in the respective data processing frameworks, and welcomes the community to contribute and make it better.

Rate this Article

Relevance
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss
General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.