Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Amazon Expands Its Machine Learning Offering with AWS Deep Learning Containers

Amazon Expands Its Machine Learning Offering with AWS Deep Learning Containers

This item in japanese


Recently, Amazon introduced AWS Deep Learning Containers (AWS DL Containers), which are Docker images pre-installed with deep learning frameworks that allow customers to deploy custom machine learning environments quickly. 

AWS DL Containers were created by Amazon to remove the "undifferentiated heavy lifting" for customers who regularly use Amazon EKS and ECS to deploy their TensorFlow workloads to the cloud. Amazon has also optimized the images for use on AWS to reduce training time and increase inferencing performance. As Jeff Barr states in a blog post around the introduction of AWS DL Containers:

The images are pre-configured and validated so that you can focus on deep learning, setting up custom environments and workflows on Amazon ECS, Amazon Elastic Container Service for Kubernetes, and Amazon Elastic Compute Cloud (EC2) in minutes!

Note that AWS DL Containers will currently support TensorFlow and Apache MXNet, with other frameworks like Facebook’s PyTorch to follow soon. Dr. Matt Wood, general manager of deep learning and AI at AWS, said onstage at the AWS Summit in Santa Clara:

We’ve done all the hard work of building, compiling, and generating, configuring, optimizing all of these frameworks, so you don’t have to. And that means that you do less of the undifferentiated heavy lifting of installing these very, very complicated frameworks and then maintaining them.

A typical deployment of a deep learning container starts with a developer creating an ECS cluster with a specific instance size. Once the cluster is running and the container agent is active, the developer can register a task definition containing specifications of the container(s) - Framework (TensorFlow or MXNet), Mode (Training or Inference), Environment (CPU or GPU), and other factors. With the registration, the developer captures the revision number and uses this number and task definition to create a service. Finally, through the AWS Console, the developer can access the task and run interferences against it.


Various big public cloud providers are active these days in delivering Machine Learning (ML) capabilities for the cloud; Amazon’s AWS DL Containers and another recently available service Elastic Inference Engine are an example of that. Additionally, Microsoft will now deliver GPU-accelerated machine learning capabilities and Google is bringing Tensorflow 2.0 support on its cloud platform.

The AWS DL Containers are available through the Amazon Elastic Container Registry (Amazon ECR) and AWS Marketplace at no cost – customers only pay for the resources that they use.

Rate this Article