BT

10gen: MongoDB’s Fault Tolerance Is Not Broken

by Abel Avram on Feb 07, 2013 |

A Cornell University professor claims MongoDB’s fault tolerance system is “broken by design”. 10gen responds through its Technical Director, rejecting the claims.

Emin Gün Sirer, an Associate Professor at Cornell University and developer of HyperDex – a key-value data store supporting ACID transactions – has recently complained about MongoDB’s fault tolerance system in a blog post entitled Broken by Design: MongoDB Fault Tolerance. The main problem reported by Sirer is the way MongoDB deals with replicating writes across multiple nodes for fault tolerance purposes. He asks the question “What does it mean when MongoDB says that a write (aka insert) is complete?” and lets the reader choose one of three reasonable answers:

  1. When it has been written to all the replicas.
  2. When it has been written to one of the replicas.
  3. When it has been written to a majority of the replicas.

Actually, none of the answers is correct, according to Sirer:

The answer is none of the above. MongoDB v2.0 will consider a write to be complete, done, finito as soon as it has been buffered in the outgoing socket buffer of the client host. Read that sentence over again. It looks like this:

The data never left the front-end box, no packet was sent on the network, and yet MongoDB believes it to be safely committed for posterity. It swears it mailed the check last week, while it's sitting right there in the outgoing mail pile in the living room.

This could be highly problematic in case of a hardware failure, when the data is no longer replicated.

Sirer makes a number of other statements regarding the getLastError() API call which checks “the number of replicas to which the last write was propagated”:

  • It’s slow
  • It doesn’t work pipelined
  • It doesn't work multithreaded

He also states that Write Concerns are broken, concluding:

So, MongoDB is broken, and not just superficially so; it's broken by design. If you're relying on its fault tolerance, you're probably doing it wrong.

This is not the first time MongoDB is criticized (see  MongoDB’s Reliability Questioned on InfoQ).

For a fair treatment, InfoQ has contacted 10gen, the company behind MongoDB, which replied through Jared Rosoff, Technical Director, to the 5 issues raised by Sirer:

Jared Rosoff: All of these questions relate to MongoDB’s Write Concern feature, so we’ll begin with an explanation of how Write Concern’s work before addressing the specific points in the article. Write Concerns are the mechanism by which developers specify the level of error checking and reporting applied to write operations on the database including inserts, updates, and deletes. Write Concerns allow control over multiple wait conditions on the database and driver. The following write concern levels are supported:

  • Errors Ignored – No error checking is performed at all. You should not use this option in normal operation.
  • Unacknowledged – Client side error checking is performed, catching things like network connectivity errors. However, the client does not wait for any response from the server.
  • Receipt Acknowledged (default mode) – The client waits for the database server to process the write operation. The driver will be informed of client, network, or server side errors that occur during processing of the request.
  • Journaled – The client waits for the write operation to be written to disk on the write ahead log of the server. The journal is written to disk once every 100ms, so use of this option may incur up to a 100ms delay before the write is acknowledged.

When using Receipt Acknowledged, or Journaled write concern level, the driver can also specify a replication factor for the write, indicating how many replicas of the write must exist before the acknowledgement should be sent. The replication factor can be specified as an integer indicating the number of copies that should exist, “majority” - a majority of the servers in the replica set, or as a string, referencing a named error mode in the server. With this background, let’s discuss the issues raised in his blog post.

Issue #1: MongoDB lies when it says that a write has succeeded

Jared Rosoff: This is simply not true. MongoDB honors the write concern provided with the write. The scenario described in the blog post corresponds to the Unacknowledged write concern level, which the developer must explicitly opt-in to when sending a write.

The default behavior in previous driver versions used to use Unacknowledged as the default behavior, with the expectation that developers would choose a more appropriate write concern based on their needs. However, we found that many developers simply used the default and this created confusion and problems. So we changed the default (http://blog.mongodb.org/post/36666163412/introducing-mongoclient ).

Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.

Issue #2. Using getLastError slows down write operations

Jared Rosoff: GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns (http://docs.mongodb.org/manual/reference/command/getLastError/). Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.

Issue #3. getLastError doesn’t work pipelined

Jared Rosoff: Depending on your application, you may want one of several behaviors. If you want to perform multiple inserts insert(a), insert(b), insert(c) and have that sequence stop if one of the inserts failed, then you can specify a write concern with each insert, which causes a getLastError to be inserted after each insert. For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. It’s important for the developer to consider their desired behavior and failure modes.

Issue #4. getLastError doesn’t work multi-threaded

Jared Rosoff: Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. All of the official MongoDB drivers provide mechanisms to ensure correct behavior of getLastError in multi-threaded environment. For simple operations (e.g. an insert with a write concern), the driver can send the getLastError message over the same connection used to send the insert before returning the connection to the connection pool. For complex operations, the drivers provide requestStart() and requestDone() API commands that bind the connection to that thread, ensuring the all operations and getLastError commands are sent over the same connection for the duration of the client’s operation. For example, see the Java Driver’s documentation on concurrency (http://docs.mongodb.org/ecosystem/drivers/java-concurrency/)

Issue #5. Write Concerns are broken

Jared Rosoff: As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

So what 10gen is saying is? by peter lin

They purposely set the default to not fault tolerant pre-2.0 release to look better on benchmarks?

If that really is the case, doesn't that cast doubt on the scalability claims of MongoDB, when other NoSql database default to write acknowledged? Quite honestly, even having that setting doesn't make sense. You at minimum want to know the insert was successful on 1 node. Beyond that, it's up to the developer.

I really hope 10gen deprecates that feature, since it's just asking for abuse.

Re: So what 10gen is saying is? by Cameron Purdy

Peter -

Being able to asynchronously queue writes on the client side is a handy feature for a small number of apps, but it definitely shouldn't have been the default.

And yes, the performance and scalability claims that I've seen were all based on the "Unacknowledged" default setting. Among other things, I'm guessing that it hid the global write lock.

Peace,

Cameron Purdy | Oracle

And in the interest of full disclosure, I do work at Oracle, but the opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Sirer is right... by Andreas Jung

the complete fault tolerance model starting with replica sets and ending with the brain dead sharding approach is broken by design.

Regarding the handling of the write concern. It is insane having to configure the write concern level on the application level. This has to be a configuration option. A developer should never deal with such aspects. In Cassandra you specify the replication options etc. as part of the keyspace configuration. Column families of a particular keyspace share the same replication model and developer must not care about which database operation should be carried out with with write concern. The MongoDB approach here is just sick.

Re: So what 10gen is saying is? by Alex Popescu

Unfortunately except the small changes since 2.2, which by the way are not addressing all the problems raised, I feel the replies are pretty much ignoring the original arguments. Put them side by side.

Followup by Emin Gün Sirer by Jonathan Allen

HyperDex looks quite interesting by Faisal Waris

Mongo is the leader in NoSQL so by contrasting with Mongo Sirer has precipitated interest in HyperDex.

HyperDex looks very interesting in terms of its claims but very little is known about it from independent sources.

Indeed an efficent multi-node full ACID capability is quite an achievment. However, the add-on called 'Warp', which gives HyperDex its ACID capabilities seems to be a commercial option only (from what I read).

Re: HyperDex looks quite interesting by Andreas Jung

And MongoDBs popularity is based on buzz and marketing - and not on technical competence and facts. 10gen is the new Oracle...making hype with a half-baked technology and trying to rip you off with horrendous support fees in case your MongoDB installation is fucked up. This is was MongoDB is all about...increasing the value of 10gen with half-baked technology.

jepsen proved prof Sirer was right by milan simonovic

It's not possible to configure mongodb (as of 2.4) so it doesn't lose data:
aphyr.com/posts/284-call-me-maybe-mongodb

Re: Sirer is right... by milan simonovic

cassandra isn't much better: aphyr.com/posts/294-call-me-maybe-cassandra

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

9 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT