Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Amazon DynamoDB - Evolution of a Hyper-Scale Cloud Database Service: Akshat Vig at QCon SF 2022

Amazon DynamoDB - Evolution of a Hyper-Scale Cloud Database Service: Akshat Vig at QCon SF 2022

Akshat Vig, a principal engineer in NoSQL databases at Amazon, spoke at QCon San Francisco about Amazon DynamoDB: Evolution of a Hyper-Scale Cloud Database Service. The talk was part of the "Architectures You've Always Wondered About" track.

Vig started with a question about who wants a database with predictable performance, high availability, and fully-managed? Lots of people in the room raised their hands. Next, he pointed out that his talk will be about the evolution of a Hyper-Scale cloud database service called Amazon DynamoDB

Talk through the lessons we have learned over the years while building this hyperscale database.

AWS offers 15 purpose-built database engines to support diverse database models ranging from relational to wide-column. Customers can choose the proper database for their use cases. 

Vig dived into the history of DynamoDB, starting with a question why DynamoDB? 

In the 2004 and 2005 timeframe was facing scaling challenges caused by the relational databases the website was using. In the following Correction of Errors (CoE), the question was asked of how to prevent the issues from happening again. Are there more efficient solutions than using a relational database for the shopping cart? Vig stated:

Choosing the right database technology is the key to building a system for scale and predictable performance.

In that timeframe, no solutions were available to meet the requirements supporting the shopping cart scenario. Hence, the company built Dynamo between 2004/2005 and 2007 and published the Dynamo paper. Subsequently, the service supported various services. 

Dynamo was created in response to the need for a highly available, scalable, and durable key-value database for the shopping cart, and more and more teams started using it.

Dynamo was a software system that teams had to take and install on resources owned by them. It became very popular, yet the feedback was that it was hard to do. Thus the question came, can this not be simpler? And DynamoDB was born. DynamoDB is a result of everything the company learned from building Dynamo. A key difference between the two is that Dynamo is single-tenant, while DynamoDB is multi-tenant. Another difference is that routing and storage are coupled in Dynamo while decoupled in DynamoDB.

Regarding DynamoDB, Vig pointed out that customers still demand performance, a fully-managed service, and high availability ten years after the service’s inception.

The key properties that DynamoDB provides are that, according to Vig:

As hundreds of thousands of customers are adopting the service and the requests rates are increasing - customers running critical workloads on the service - the performance they see is consistent. 

Subsequently, Vig provided an overview of the service capabilities before discussing how it evolved over ten years.

Next, Vig explained how DynamoDB scales and provides predictable performance through an example of a putItem request. A request is first authenticated in the Request Router (RR) via Access and Identity Management (IAM); once it is authenticated and authorized, the metadata is inspected to find out where it needs to be routed, and then a verification takes place (if storage can take place or not) and finally it's send to the storage node. Every item in DynamoDB AWS maintains multiple copies (a leader node and two child nodes).

Now DynamoDB, with its multi-tenancy capabilities, doesn’t have one RR, but many of them. And the service is also designed to be fault-tolerant due to each AWS region’s availability zones. Vig explained how the scale with DynamoDB works.

Next, Vig explained creating a table, partition keys, and the scalability with those keys. He continued with the challenges they face at AWS for data distribution and predictable performance by explaining the read capacity units (RCU) and the token bucket algorithm they leverage

What we found out when we launched DynamoDB is that uniform distribution for the workloads is very hard. 

Traffic comes in waves and spikes, and customers can face throttling. That can lead to provisioning too much capacity leading to waste and increased spending. Dynamo launched a feature called bursting, which Vig explains solved that. In addition, he explained how they solved throughput dilution (another issue) by moving global admission control from the partitional level to the request router layer and letting all partitions burst.

Still, there were scenarios where traffic reached the maximum at a specific partition, which Vig pointed out was solved by split for consumption capability. And finally, to make read and write capacity provision easier, Vig pointed out that the company introduced pay-per-request

Vig concluded with the lesson:

Adapting to customers’ traffic patterns to reshape the physical partitioning schema of the database tables improves customer experience.

The next topic Vig touched upon was how DynamoDB provides high availability. He started with explaining that many customers have moved to DynamoDB over the years and discussed a disruption that happened during that time. A disruption is leading to the improvement of the service. The disruption was caused by caches they had at the time, which he explained in depth. Going forward, Vig said they solved it by removing the weak link so the system could continuously operate. The solution included an in-memory distributed datastore called MemDS.

Over lesson learned was that designing systems for predictability over absolute efficiency improves system stability, which was the second lesson.

Lastly, Vig provided some lessons (two thoroughly discussed with the talk) and key takeaways.



About the Author

Rate this Article