BT

Building GPU Accelerated Workflows with TensorFlow and Kubernetes

| by Srini Penchikala Follow 37 Followers on Jan 04, 2018. Estimated reading time: 1 minute |

Daniel Whitenack spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about GPU based deep learning workflows using TensorFlow and Kubernetes technologies.

He started off discussing a typical artificial intelligence (AI) workflow with an example of object detection. The workflow includes steps like pre-processing, model training, model generation, and finally model inference. All the different stages of the AI/ML workflow can be executed on Docker containers.

Model training is typically done using a framework like Tensorflow or Caffe. This stage is also where GPU's come into play to help with performance. Deep learning workflows that utilize TensorFlow or other frameworks need GPUs to efficiently train models on image data.

Model training programs can run on GPU nodes using Kubernetes clusters. Kubernetes provides a nice framework for multiple GPU nodes running on the platform. The workflow process works better by following the steps below:

  • Get the right pieces of data to the right code (pods)
  • Process data on the right nodes
  • Trigger the right code at the right time

It can also be used to track which versions of code and data ran to produce which results (for debugging, maintenance, and compliance purposes).

Kubernetes provides the foundation for all of this and is great for machine learning projects because of its portability and scalability.

Whitenack discussed the open source project called Pachyderm that supports data pipeline and data management layer for Kubernetes. The workflows typically involve mutli-stage data pre-processing and post-processing jobs. Pachyderm provides a unified framework for scheduling multi-stage workflows, managing data, and offloading workloads to GPUs.

He talked about the capabilities of Pachyderm framework, which include the following:

  • Data versioning: versioned data can be stored in an Object Store like Amazon S3 database
  • Containers for analysis
  • Distributed pipelines or DAG's for data processing
  • Data provenance: this helps with compliance and debugging

Whitenack also gave a live demo of the AI workflow using Pachyderm and Kubernetes. The sample application implements an image to image translation use case where satellite images are automatically transferred to maps. It utilizes TensorFlow for model training and inference.

If you are interested in learning more about Pachyderm framework, checkout the machine learning examples, Developer Documentation, Kubernetes GPU documentation or join the Slack channel.
 

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT