Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which will provide customers with an efficient approach to process data across platforms.

The Cloud Dataproc service has been generally available for over three years, and now offers alpha access to running Spark jobs on Google Kubernetes Engine (GKE), enabling developers and data scientists to run Apache Spark jobs on GKE clusters. Typically, Spark applications run on Hadoop YARN clusters, however, with Cloud Dataproc for Kubernetes, users will have one central view that can span both YARN and Kubernetes clusters, and they do not need to manage them separately. Furthermore, according to the announcement blog post, the support for both clusters will give enterprises more flexibility to modernize specific hybrid workloads while continuing to monitor YARN-based workloads.

Running Apache Spark on Kubernetes differs from running this on virtual machine-based Hadoop clusters, which is the current mechanism provided by the existing CloudProc Dataproc service or competitive offerings like Amazon Web Services (AWS) Elastic MapReduce (EMR) and Microsoft's Azure HDInsight (HDI).

Apache Spark is the first open-source processing engine Google brings to Cloud Dataproc on Kubernetes. The tech giant is planning to bring other open-source analytics components to Kubernetes as well, such as Apache Flink, Presto and Apache Druid. Furthermore, hybrid cloud products like Anthos are attempting to make GKE runnable virtually anywhere, allowing customers to run Cloud Dataproc on their own data centers, or eventually within the Amazon Elastic Kubernetes Service (EKS) and Azure Kubernetes Services (AKS).

In the same Google announcement blog post, Matt Aslett, research vice president at 451 Research, said:

Enterprises are increasingly looking for products and services that support data processing across multiple locations and platforms. The launch of Cloud Dataproc on Kubernetes is significant in that it provides customers with a single control plane for deploying and managing Apache Spark jobs on Google Kubernetes Engine in both public cloud and on-premises environments.

Customers who want to try out Cloud Dataproc for Kubernetes will have to apply for access by emailing Google. Furthermore, the alpha release is intended for testing and experimentation purposes only. More details on Cloud Dataproc for Kubernetes are available on the How to Get Started blog post.

Rate this Article