BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Jepsen Disputes MongoDB’s Data Consistency Claims

Jepsen Disputes MongoDB’s Data Consistency Claims

Leia em Português

This item in japanese

Bookmarks

In an article titled MongoDB and Jepsen, MongoDB claimed that their database passed “the industry’s toughest data safety, correctness, and consistency Tests”. In response, Jepsen published an article stating that MongoDB 3.6.4 had in fact failed their tests; the newer MongoDB 4.2.6 has more problems including “retrocausal transactions” where a transaction reverses order so that a read can see the result of a future write.

Jepsen LLC’s response begins with this reply to Maxime Beugnet on their official Twitter feed:

I have to admit raising an eyebrow when I saw that web page. In that report, MongoDB lost data and violated causal by default. Somehow that became "among the strongest data consistency, correctness, and safety guarantees of any database available today"!

The report in question was titled MongoDB 3.6.4 by Kit Patella. The new report, by Kyle Kingsbury, expands on this:

Similarly, MongoDB’s default level of read concern allows aborted reads: readers can observe state that is not fully committed, and could be discarded in the future. As the read isolation consistency docs note, “Read uncommitted is the default isolation level”.

We found that due to these weak defaults, MongoDB’s causal sessions did not preserve causal consistency by default: users needed to specify both write and read concern majority (or higher) to actually get causal consistency. MongoDB closed the issue, saying it was working as designed, and updated their isolation documentation to note that even though MongoDB offers “causal consistency in client sessions”, that guarantee does not hold unless users take care to use both read and write concern majority. A detailed table now shows the properties offered by weaker read and write concerns.

Transaction Isolation Failures

In recent years MongoDB has been heavily promoting its transactional capabilities. But as Jepsen found, transactional support doesn’t work by default. In one test, transactions were used to append values to a document. They found that even with write concern majority at the database/collection level, “transactions appeared to lose acknowledged writes” when using the default write concern at the transactional level. (This can be addressed by explicitly specifying a write concern at the transaction level.)

Clients observed a monotonically growing list of elements until [1 2 3 4 5 6 7], at which point the list reset to [], and started afresh with [8]. This could be an example of MongoDB rollbacks, which is a fancy way of saying “data loss”.

This is bad, but a more subtle question arises: why were we able to read these values at all? After all, read concern linearizable is supposed to show only majority-acknowledged (i.e. durable) writes. The answer is a surprising—but documented—MongoDB design choice:

Operations in a transaction use the transaction-level read concern. That is, any read concern set at the collection and database level is ignored inside the transaction.

Effectively this means “transactions without an explicit read concern downgrade any requested read concern at the database or collection level to a default level of local”, allowing the transaction to read uncommitted data which may be later rolled back.

The inverse is also problematic. According to the documentation, “If the transaction does not use write concern 'majority' for the commit, the 'snapshot' read concern provides no guarantee that read operations used a snapshot of majority-committed data.”. In other words, the read concern “snapshot” is effectively ignored without setting the write-concern. And again, this must be done at the transaction level because transactions ignore the collection and database level settings.

Retrocausal Transactions

Even with snapshot isolation, there were numerous scenarios with unexpected results. Most of them are too complex to summarize here, but one of them really stood out.

In one test, Jepsen researchers told the client to read a document and then append a value to it. At the start of the test, the document contained the sequence [2, 3, 4]. After reading the value, the document was altered to be [1, 2, 3, 4].

This usually worked, but in four transactions the client read [1, 2, 3, 4] from the database. Kingsbury continues,

This is, of course, impossible: our test submits each transaction’s operations in strict order, and unless MongoDB has built a time machine, it cannot return values which it doesn’t yet know will be written. This suggests that the retrocausal transaction actually ran twice, and on its second run, observed an effect of its own prior execution. This could be another consequence of an inappropriate retry mechanism.

This isn’t the only time the retry mechanism has been blamed.

We found that network partitions could cause MongoDB to duplicate the effects of transactions. Despite never appending the same value to an array twice, we repeatedly observed arrays with multiple copies of the same element.

In an attempt to understand these behaviors better, researchers attempted to disable automatic retries only to discover that “MongoDB transactions ignore the retryWrites setting, and retry regardless”.

In addition to offering advice to developers on how to more safely use MongoDB, Jepsen recommends that “MongoDB may wish to revise their marketing language to use ‘snapshot isolated’ instead of ‘ACID’”.

Editors Note: A previous version of this article implied that that data loss could always occur when using transactions. This specific problem only occured when using the default write concern for transactions. However, other anomalies were detected with transactions using write concern majority.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • MongoDB is web scale!

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

  • Re: MongoDB is web scale!

    by Ben Cotton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Lordy, the drama. Stay strong MongoDB ... there is nothing 'failed' by defaulting to a READ_UNCOMMITTED isolation TX policy.

  • Jepsen's failing MongoDB's default TX isolation (READ_UNCOMMITTED) ...

    by Ben Cotton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    ... may be valid and sound for DIRTY_READ intolerant MongoDB ACID tests, but, should not (in any way) imply MongoDB as being incomplete. I.e. if MongoDB provides *any* setIsolation(READ_COMMITTED); API then the impact of their *default* isolation failings should be noted. Jepsen does some amazing things, but, isolation policies that default to DIRTY_READ tolerance are (typically) acceptable.

  • Re: Jepsen's failing MongoDB's default TX isolation (READ_UNCOMMITTED) ...

    by Jonathan Allen,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    What other databases default to dirty reads?

    What other databases ignore the table level isolation level when you start a transaction?

  • Re: Jepsen's failing MongoDB's default TX isolation (READ_UNCOMMITTED) ...

    by Ben Cotton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Agreed. The choice of DIRTY_READ tolerance as the default is rare ... Jepsen's criticisms are valid and sound, but (IMHO) should not *imply* that MongoDB *fails* to accommodate DIRTY_READ intolerant transaction use cases COMPLETELY. They do not accommodate the DIRTY_READ intolerant "out of the box". BTW, good job forensically reporting on the Jepsen <--> MongoDB exchange.
    </-->

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT