BT

Uber Engineering Moving from Postgres to MySQL

| by Alex Giamas Follow 10 Followers on Aug 12, 2016. Estimated reading time: 2 minutes |

In a recent blog post, Uber detailed why they have chosen to replace PostgreSQL with MySQL.

Uber’s main problem stems from the write amplification issue in Postgres. The write amplification happens because an update to a single row that touches indexes needs to update all indexes resulting in many writes to the disk which can be an even bigger issue when using SSDs. HOT(Heap-Only-Tuples) feature can alleviate this issue and may be a solution in some use cases. Consequently, the write amplification issue leaks into replication causing multiple updates to be transferred over wire for simple updates, which can cause major issues in disaster recovery scenarios where data centers may be far away from each other and bandwidth cannot be cheap or readily available.

Also, during a routine upgrade, a Postgres 9.2 bug caused data corruption in some tables. This was caused because of missing marking some entries as inactive as they should had been. Calculating the number of entries affected was not possible and because replication happens at the physical layer, there was the risk of ruining the database indexes.

Postgres also lacks true replica MVCC support. Replicas have to apply the same WAL(Write Ahead Log) writes as the master. Combined with Postgres design to block database updates if they affect rows open by a transaction it can seriously affect long running transactions. Long running transactions will be killed by Postgres since they block the WAL thread and this can be a problem since application developers may not be aware of the problem especially if using an ORM that isn’t transparent in transaction boundaries.

Again because replication works in the physical level, database updates have to happen at the same time for all nodes or else replication can not work. This means that for the size of Uber, upgrading to a new release was really problematic. This has been fixed using pglogical starting from 9.4 .

As for MySQL, in Uber’s case having flexibility in replication, lighter thread per connection instead of process per connection and less expensive caching are some of the advantages that they considered in their design decision. In the main issue of on disk representation, using InnoDB storage makes compaction more efficient and won’t affect many indexes or result in the write amplification problem as was the case with Postgres.

Some great rebuttals to Uber’s use case are available by Markus Winand, Simon Riggs and Robert Haas where they detail how these issues can be solved in several use cases and how it is not the case that Postgres should be ditched in every case for MySQL or vice versa.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Important to read the responses by Simon Riggs

It's important to read the responses to get an accurate and balanced view.
blog.2ndquadrant.com/thoughts-on-ubers-list-of-...

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss
BT