Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Cloud TPU for Machine Learning Acceleration is Now Available in Beta

Google Cloud TPU for Machine Learning Acceleration is Now Available in Beta

Google has made their custom chips, Tensor Processing Units (TPU) for running machine learning workloads written for its TensorFlow framework, available in beta for Machine Learning (ML) experts and developers. With Google’s Cloud TPUs, ML models can run on demand at lower costs and higher performance.

Cloud TPUs was first announced by Google ten months ago during its I/O developer conference. Google initially only provided access to a limited group of developers and researchers. Now developers can have access by requesting for a specific TPU quota along with their intention for using the service. According to a TechCrunch article, Google states the following:

Once they get in, usage will be billed at $6.50 per Cloud TPU and hour. In comparison, access to standard Tesla P100 GPUs in the U.S. runs at $1.46 per hour, though the maximum performance here is about 21 teraflops of FP16 performance.

A Cloud TPU consists of "AI accelerator" application-specific integrated circuits (ASICs), which Google designed for neural network machine learning. Each Cloud TPU has four custom ASICs on a single board, which provides up to a 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory. Moreover, each board can be used stand-alone or connected through a dedicated high bandwidth network to form multi-petaflop ML supercomputers or so-called "TPU pods." Google will offer these instance types later this year. 

Image source:

With the availability of Cloud TPUs, Google ML experts and developers can have their workloads processed more cost effectively and more rapidly, which will ultimatelylead to increased productivity. First, no investments in setting up, designing and installing on-premise clusters are required -- they can simply push their workloads to the Google Cloud Platform. Second, they no longer need to wait for a scheduled job on a compute cluster as, for instance, they can get exclusive access to a dedicated Google Cloud TPU. If they need more computing power to have a vital model trained faster, a group of TPUs can be allocated on demand. And finally, Google has open-sourced a set of reference high-performance Cloud TPU model implementations as a reference, and it offers several high-level APIs for TensorFlow to program the Cloud TPUs.

Each significant public cloud provider offers a wide variety of platform services to allow customers to push their workloads to them. It is no different with Machine Learning. However, with the combination of TensorFlow and TPUs, Google does offer a service that in the short term has an edge to its competitors. Jeffrey Burt, an author at TheNextPlatform, wrote in a recent article:

Google is positioning the Cloud TPUs as another weapon in its arsenal as it looks to carve into the market shares of cloud computing leaders Amazon Web Services and Microsoft Azure. The goal is to offer Google Cloud customers options for their machine learning workloads that include a variety of high-performance CPUs, including Skylake, and GPUs, like the Tesla V100 accelerator, along with the Cloud TPUs.

Furthermore, according to Jillian D'Onfro, technology reporter for, Google will even benefit from having their chips available on the Google Cloud platform. In a recent tech article she wrote:

First, by using its own silicon, Google has a cheaper, more efficient alternative to relying on chipmakers like Nvidia and Intel for its core computing infrastructure. Owning its own hardware enables Google to experiment faster. The new TPUs also allow parent company Alphabet to add a revenue stream to the Google Cloud Platform. GCP and Google's collection of business apps called GSuite now generate more than $1 billion a quarter.

To conclude, trying to differentiate and stand out with any platform services is critical for each public cloud provider to gain more market share and value.

Rate this Article