Amazon Announces General Availability of Aurora Multi-Master

In a recent announcement, Amazon has publicized the general availability of Aurora Multi-Master, which allows for reading and writing on multiple MySQL database instances across several Availability Zones (AZs). This brings high availability capabilities, as the platform no longer needs to trigger a fail-over or promote a read-replace upon failure of database instances.

Amazon Aurora is a relational database that is compatible with open-source databases MySQL and PostgreSQL. It is backed by multiple compute nodes and a multi-AZ storage layer, and is fully managed by Amazon Relational Database Service (RDS). Correspondingly, the service is similar to those of other large cloud providers like Microsoft's Azure Database Services and Google's Cloud SQL. Regarding the new multi-master feature, it is essential to note that this is currently only available on the MySQL compatible edition, limited to specific regions, and supports a maximum of two writer nodes.

Previously, when just single-master mode was available, a failure of an Aurora database instance resulted in a fail-over to another instance. Downtime on write operations accompanied this fail-over, which, depending on the database, could run up to thirty seconds in the case of a CNAME record adjustment, or even sixty seconds in the case of the platform needing to restart the database. However, with the introduction of Multi-Master, Amazon has now presented a means to eliminate the need for a fail-over. Instead, applications can redirect their read and write actions to the other instance, which is already up and running, and was previously processing other database operations as well. Preferably a dedicated connection manager should coordinates this, for example, by implementing a singleton which keeps track of connections, database health, and makes a logical distribution of database calls.

Source: https://aws.amazon.com/blogs/database/building-highly-available-mysql-applications-using-amazon-aurora-mmsr/

When using Aurora Multi-Master, each node in the cluster is both a reader and a writer, and thus applications can use all nodes to handle their workloads. This provides high availability for the databases, although this also may introduce issues surrounding replication and consistency. The platform takes care of handling this by using a quorum of storage nodes that either approves or rejects any changes which the master nodes present to them. For optimal performance the connection manager should avoid conflicts on writes as much as possible by distributing connections to hit the same pages as little as possible. On this subject Mukund Sundararajan, solutions architect at Amazon Web Services, provides several best practices to use when designing applications.

Avoid performing overlapping page updates from concurrent writers. If you have a sharded database, it would be a good idea to assign a writer to a shard and update shards through the assigned writer. The mapping is only logical in the application layer. Physically, the data in the storage volume is visible to all writer nodes. Transactions that span shards can still be executed from a single writer.

When a conflict is raised by the database node to the application layer, retry the transaction. Techniques such as exponential backoff give the buffer pool time to catch up with replication and reflect the most recent change to the page touched by the transaction, increasing the chances of success.

Based on your own application's design and needs, route queries to writers in a way that achieves an acceptable write conflict ratio, equal writer utilization, and the best possible availability.

By default, Aurora uses a read-after-write consistency, where reading data is consistent if done on the same node on where the writer wrote their information. Synchronizing this write may take several milliseconds to reach consistency on other nodes. Additionally, multi-master clusters provide another option at session level called global read-after-write (GRAW). With GRAW it is possible to see instead only consistent data, no matter which node updated this. Implementing these measures does come with a performance penalty, and thus Amazon advises to only apply this for specific queries that need this behavior.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter