Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Meta Switches to MySQL Raft to Improve Reliability and Operational Simplicity

Meta Switches to MySQL Raft to Improve Reliability and Operational Simplicity

This item in japanese

Meta is rolling out MySQL Raft in its data centers to replace its current MySQL semisynchronous databases. The new consensus engine helps operation and allows MySQL servers to take responsibility for promotions and membership.

One of the largest MySQL deployments in the world, Meta’s MySQL datastore is a massively sharded, geo-replicated deployment with millions of shards. Powering the social graph and services like Messaging, Ads, and Feed, the cluster holds petabytes of data, running on thousands of servers in several regions and data centers. Anirban Rahut, Abhinav Sharma, Yichen Shen, and Ahsanul Haque, software and production engineers at Meta, explain:

Over the last few years, we have implemented MySQL Raft, a Raft consensus engine that was integrated with MySQL to build a replicated state machine. We have migrated a large portion of our deployment to MySQL Raft and plan to fully replace the current MySQL semisynchronous databases with it.

According to the team, the new MySQL deployment provides higher reliability, provable safety, significant improvements in failover time, and operational simplicity without compromising write performance.

Previously, Meta replication used the MySQL semisynchronous (semisync) replication protocol. The primary would use semi-synchronous replication to two log-only replicas (logtailers) within the primary region for sub-millisecond latency, with regular MySQL primary-to-replica asynchronous replication used for distribution to other regions. The team explains the challenges they were facing:

To help guarantee safety and avoid data loss during the complex promotion and failover operations, several automation daemons and scripts would use locking, orchestration steps, a fencing mechanism, and SMC, a service discovery system. It was a distributed setup, and it was difficult to accomplish this atomically. The automation became more complex and harder to maintain over time as more and more corner cases needed to be patched.


The team decided instead to take a completely new approach, enhancing MySQL and making it a truly distributed system: Meta switched to Raft with the control plane and data plane operations part of the same replicated log. Mark Callaghan, previously MTS at Facebook and distinguished engineer at Mongo, comments:

MySQL + Raft go together like peanut butter and jelly or pizza, ham, and pineapple.

Shrikanth Shankar, senior director at Databricks, highlights the complexity of the change:

People joke about replacing the aircraft engine while the plane is flying but that’s what this project was. Kudos to the team for pulling this off!

Peter Zaitsev, founder at Percona and open source advocate, asks instead:

Why is Facebook building MySQL Raft rather than using or improving MySQL Group Replication?

Raft for MySQL is based on Apache Kudu, with Meta modifying it for the needs of MySQL and publishing a fork as an open-source project, kuduraft. New features added to kuduraft are FlexiRaft, an option to support two different intersecting quorum, and proxying, the ability to use a proxy intermediate node and reduce network bandwidth. Furthermore, the compression and log abstraction improvements allow the compression of binary log payloads before distribution and different physical logfile implementations.


About the Author

Rate this Article