Beam Graduates to Top-Level Apache Project

| by Dylan Raithel Follow 8 Followers on Feb 21, 2017. Estimated reading time: 1 minute |

Beam recently graduated to a top-level project at Apache Software Foundation. Beam's goals include letting one process unbounded, out-of-order, global-scale data with portable high-level data pipelines. Beam was initially an internal Google project later moved into Apache, and was in incubation from February, 2016 through late last year. The Beam project seeks to create a unified programming model for streaming and batch processing jobs, and to produce artifacts that can be consumed by a number of supported data processing engines. Beam seeks to:

provide the world with an easy-to-use, but powerful model for data-parallel processing, both streaming and batch, portable across a variety of runtime platforms... The Beam SDKs use the same classes to represent both bounded and unbounded data, and the same transforms to operate on that data.

The SDK's available in Java and Python provide abstraction between the background processing engine of choice and the processing pipeline components. Supported processing engines include Apache Apex, Flink, Spark and Google's Cloud Dataflow engine.

The programming model for Beam pipeline involve PCollection(s), Transform(s), and Pipeline I/O as well as the runners for each supported processing engine, whose omission defaults Beam to a local DirectRunner:

Google's motivation for open-sourcing Beam is part of an emerging business model that supports integrating with, and contributing to other open-source projects. The rationale is that doing so will increase the adoption potential for the Beam project, in the hopes of more exposure for Google Dataflow platform and for it to emerge as the processing engine of choice among supported engines. Google's comparison between Spark and Beam note the Beam model as the correct model for stream and batch data processing due to Beam's focus on, and importance of semantics enabled by event-time windowing, watermark, and trigger features. The open source community and broader data science industry has yet to empirically validate these claims independently of Google and should be addressed with more use case analysis around architecture and benchmarking. Early signs indicate a growing Beam community and positive feedback around supporting multiple processing platforms.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you