10gen: MongoDB’s Fault Tolerance Is Not Broken
A Cornell University professor claims MongoDB’s fault tolerance system is “broken by design”. 10gen responds through its Technical Director, rejecting the claims.
Emin Gün Sirer, an Associate Professor at Cornell University and developer of HyperDex – a key-value data store supporting ACID transactions – has recently complained about MongoDB’s fault tolerance system in a blog post entitled Broken by Design: MongoDB Fault Tolerance. The main problem reported by Sirer is the way MongoDB deals with replicating writes across multiple nodes for fault tolerance purposes. He asks the question “What does it mean when MongoDB says that a write (aka insert) is complete?” and lets the reader choose one of three reasonable answers:
- When it has been written to all the replicas.
- When it has been written to one of the replicas.
- When it has been written to a majority of the replicas.
Actually, none of the answers is correct, according to Sirer:
The answer is none of the above. MongoDB v2.0 will consider a write to be complete, done, finito as soon as it has been buffered in the outgoing socket buffer of the client host. Read that sentence over again. It looks like this:
The data never left the front-end box, no packet was sent on the network, and yet MongoDB believes it to be safely committed for posterity. It swears it mailed the check last week, while it's sitting right there in the outgoing mail pile in the living room.
This could be highly problematic in case of a hardware failure, when the data is no longer replicated.
Sirer makes a number of other statements regarding the getLastError() API call which checks “the number of replicas to which the last write was propagated”:
- It’s slow
- It doesn’t work pipelined
- It doesn't work multithreaded
He also states that Write Concerns are broken, concluding:
So, MongoDB is broken, and not just superficially so; it's broken by design. If you're relying on its fault tolerance, you're probably doing it wrong.
This is not the first time MongoDB is criticized (see MongoDB’s Reliability Questioned on InfoQ).
For a fair treatment, InfoQ has contacted 10gen, the company behind MongoDB, which replied through Jared Rosoff, Technical Director, to the 5 issues raised by Sirer:
Jared Rosoff: All of these questions relate to MongoDB’s Write Concern feature, so we’ll begin with an explanation of how Write Concern’s work before addressing the specific points in the article. Write Concerns are the mechanism by which developers specify the level of error checking and reporting applied to write operations on the database including inserts, updates, and deletes. Write Concerns allow control over multiple wait conditions on the database and driver. The following write concern levels are supported:
- Errors Ignored – No error checking is performed at all. You should not use this option in normal operation.
- Unacknowledged – Client side error checking is performed, catching things like network connectivity errors. However, the client does not wait for any response from the server.
- Receipt Acknowledged (default mode) – The client waits for the database server to process the write operation. The driver will be informed of client, network, or server side errors that occur during processing of the request.
- Journaled – The client waits for the write operation to be written to disk on the write ahead log of the server. The journal is written to disk once every 100ms, so use of this option may incur up to a 100ms delay before the write is acknowledged.
When using Receipt Acknowledged, or Journaled write concern level, the driver can also specify a replication factor for the write, indicating how many replicas of the write must exist before the acknowledgement should be sent. The replication factor can be specified as an integer indicating the number of copies that should exist, “majority” - a majority of the servers in the replica set, or as a string, referencing a named error mode in the server. With this background, let’s discuss the issues raised in his blog post.
Issue #1: MongoDB lies when it says that a write has succeeded
Jared Rosoff: This is simply not true. MongoDB honors the write concern provided with the write. The scenario described in the blog post corresponds to the Unacknowledged write concern level, which the developer must explicitly opt-in to when sending a write.
The default behavior in previous driver versions used to use Unacknowledged as the default behavior, with the expectation that developers would choose a more appropriate write concern based on their needs. However, we found that many developers simply used the default and this created confusion and problems. So we changed the default (http://blog.mongodb.org/post/36666163412/introducing-mongoclient ).
Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.
Issue #2. Using getLastError slows down write operations
Jared Rosoff: GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns (http://docs.mongodb.org/manual/reference/command/getLastError/). Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.
Issue #3. getLastError doesn’t work pipelined
Jared Rosoff: Depending on your application, you may want one of several behaviors. If you want to perform multiple inserts insert(a), insert(b), insert(c) and have that sequence stop if one of the inserts failed, then you can specify a write concern with each insert, which causes a getLastError to be inserted after each insert. For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. It’s important for the developer to consider their desired behavior and failure modes.
Issue #4. getLastError doesn’t work multi-threaded
Jared Rosoff: Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. All of the official MongoDB drivers provide mechanisms to ensure correct behavior of getLastError in multi-threaded environment. For simple operations (e.g. an insert with a write concern), the driver can send the getLastError message over the same connection used to send the insert before returning the connection to the connection pool. For complex operations, the drivers provide requestStart() and requestDone() API commands that bind the connection to that thread, ensuring the all operations and getLastError commands are sent over the same connection for the duration of the client’s operation. For example, see the Java Driver’s documentation on concurrency (http://docs.mongodb.org/ecosystem/drivers/java-concurrency/)
Issue #5. Write Concerns are broken
Jared Rosoff: As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.
So what 10gen is saying is?
If that really is the case, doesn't that cast doubt on the scalability claims of MongoDB, when other NoSql database default to write acknowledged? Quite honestly, even having that setting doesn't make sense. You at minimum want to know the insert was successful on 1 node. Beyond that, it's up to the developer.
I really hope 10gen deprecates that feature, since it's just asking for abuse.
Re: So what 10gen is saying is?
Being able to asynchronously queue writes on the client side is a handy feature for a small number of apps, but it definitely shouldn't have been the default.
And yes, the performance and scalability claims that I've seen were all based on the "Unacknowledged" default setting. Among other things, I'm guessing that it hid the global write lock.
Cameron Purdy | Oracle
And in the interest of full disclosure, I do work at Oracle, but the opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.
Sirer is right...
Regarding the handling of the write concern. It is insane having to configure the write concern level on the application level. This has to be a configuration option. A developer should never deal with such aspects. In Cassandra you specify the replication options etc. as part of the keyspace configuration. Column families of a particular keyspace share the same replication model and developer must not care about which database operation should be carried out with with write concern. The MongoDB approach here is just sick.
Re: So what 10gen is saying is?
Followup by Emin Gün Sirer
HyperDex looks quite interesting
HyperDex looks very interesting in terms of its claims but very little is known about it from independent sources.
Indeed an efficent multi-node full ACID capability is quite an achievment. However, the add-on called 'Warp', which gives HyperDex its ACID capabilities seems to be a commercial option only (from what I read).
Re: HyperDex looks quite interesting
jepsen proved prof Sirer was right
Re: Sirer is right...