Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Provides a Peek into the Architecture of Colossus - Its Storage Foundation

Google Provides a Peek into the Architecture of Colossus - Its Storage Foundation

This item in japanese

In a recent post on the Google Cloud blog, Google provided a glimpse into the architecture of Colossus. Colossus underpins Google's scalable storage system, which serves both its Google Cloud offerings and Google's own globally available services such as YouTube, Google Drive, and Gmail.

Five separate components compose Colossus - the client library, curators, metadata database, "D" file servers, and custodians. The following image depicts the relationship between these components.


The Colossus client library is used by various client applications as their entry point into the platform. It implements features such as software RAID and allows fine-tuning performance and cost trade-offs for different workloads. 

Next, Colossus curators form the bulk of the Colossus Control Plane. These curators are in charge of control operations, such as file creation, and can scale horizontally. Curators store metadata in Google's high-performance NoSQL database, BigTable. Storing file metadata in BigTable allows Colossus to scale up by over 100x over the largest previous-generation clusters.

The data itself flows directly between the client application and the "D" file servers, which are network-attached disks. Custodians operate on the file serves as well, playing "a key role in maintaining the durability and availability of data as well as overall efficiency, handling tasks like disk space balancing and RAID reconstruction."

The authors, Dean Hildebrand and Denis Serenyi, explain how Colossus works in action:

With Colossus, a single cluster is scalable to exabytes of storage and tens of thousands of machines. For example, in the example below, we have instances accessing Cloud Storage from Compute Engine VMs, YouTube serving nodes, and Ads MapReduce nodes—all of which can share the same underlying file system to complete requests. The key ingredient is having a shared storage pool managed by the Colossus control plane, providing the illusion that each has its isolated file system. 


Colossus forms one of the three building blocks used for implementing Google's storage services. The other two are Spanner, Google's globally consistent, scalable relational database, and Borg, a scalable cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across many clusters, each with up to tens of thousands of machines. Many often see Borg as the conceptual predecessor that heavily influenced the development of Kubernetes.

Rate this Article