BT

Modern Big Data Pipelines over Kubernetes

| by Srini Penchikala Follow 36 Followers on Jan 08, 2018. Estimated reading time: 2 minutes |

Container management technologies like Kubernetes make it possible to implement modern big data pipelines. Eliran Bivas, senior big data architect at Iguazio, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about big data pipelines and how Kubernetes can help develop them.

In the past, big data solutions used to be mainly based on Hadoop, but the ecosystem has evolved in recent years with new databases, streaming data and machine learning solutions which which require more than the Hadoop deployment model (Map/Reduce, YARN and HDFS). These solutions also require a cluster scheduling layer to host diverse workloads such as Kafka, Spark and TensorFlow, working with data stored in databases like Cassandra, Elasticsearch and cloud-based storage.

Bivas talked about the different teams typically involved in software developmet lifecycle and their primary objectives. Application engineers want agile software development, whereas data engineers care about where the data is and they want the database systems to keep working. And the DevOps teams want all the systems to work with less maintenance and disruptions. Because of the container technologies revolution, all of these objectives are possible to accomplish in the organizations.

He discussed a common framework to create cloud native end-to-end analytics applications. Developers should decouple the data services from applications and frameworks to make the big data solutions flexible and efficient. It also helps with data services which are typically used to manage different types of data like unstructured or structured or streaming data.

The solutions should be ideally based on cloud native applications and frameworks and use the unified orchestration provided by Kubernetes.

Bivas described the continuous analytics flow model which includes data services in the middle to analyze the data coming from operational data stores (relational databases), external sources (IoT) using conainerized big data analytics tools like Spark and TensorFlow.

Serverless frameworks like Kubeless and OpenFaaS are a great choice to use in these solutions. Serverless solutions are easy to deploy with no YAMLs, Dockerfile, or build involved. They also support auto scaling and event triggers.

Bivas discussed the architecture details of Nuclio, a real-time serverless platform which was recently open sourced. The architecture involves using Kubernetes as an alternative to YARN, and using frameworks like Spark ML, Presto, TensorFlow & Python and serverless functions coupled with local and cloud-based storage. Nuclio also supports pluggable event sources and data sources.

He also talked about an automotive customer use case of real-time analytics for vehicle maintenance. The solution includes the vehicle data being streamed using web API's and microservices being used for data ingestion. The vehicle data is enriched in real time with weather and road data to find correlations between weather condition and vehicle components.

The presentation included a demo to show the benefits of running big data analytics over a cloud native architecture. Bivas concluded the session with some best practices that developers know about the tools provided by Kubernetes, do the application logging, collect the metrics and use the metrics to get insights into application performance.

If you are interested in learning more about Nuclio framework, checkout their github project, code examples and the documentation.
 

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT