Search code examples
mongodbreplicationfault-tolerance

Is MongoDB v2.6's WriteConcern "Broken"?


Edit - Possible Duplicate: To what extent are 'lost data' criticisms still valid of MongoDB? - If I had just punched something into Google differently, I more-or-less would've had this question answered. Sorry for the semi-dupe everyone.


I hate asking this question here as I'm not 100% sure if it conforms to this site's guidelines or not. If it doesn't, I do apologize. Currently I am looking to build an application and was seriously considering MongoDB as the datastore until I came across the two articles below.

My question is specifically in relation to Emir's first-described issue (Emin Gun Sirer responding to MongoDB CTO Jared Rosoff's response to Emin's original article detailing how MongoDB is broken):

MongoDB Is Broken (Original): http://hackingdistributed.com/2013/01/29/mongo-ft/

MongoDB Is Broken (Response to Rosoff): http://hackingdistributed.com/2013/02/07/10gen-response/

These articles are dated a good year and a half ago now. I have been trying to determine if MongoDB's WriteConcern is still broken (e.g., MongoDB is still not fault tolerant in the way Emin describes in Issue #1), but it appears most of the comments and articles surrounding this topic died out about as quickly as they sprang up (dead silence after February or May as far as I can tell with Google).

I understand now that MongoDB has set the default WriteConcern to ReceiptAcknowledged, but apparently this (and the even more consistent/fault-tolerant option, Journaled) does not guarantee that a write operation has been written to disk on more than one node.

Could someone please tell me if MongoDB now has a WriteConcern setting that confirms a write operation has been written to disk on more than one node?

Thanks in advance, and again I apologize if I'm asking this question in the wrong place.


Solution

  • Yes, you can set the write concern to w=majority to ensure that the application does not consider the write committed until it would be durable in the face of a single node failure. Here is the relevant documentation:

    http://docs.mongodb.org/manual/core/replica-set-write-concern/

    w=majority guarantees that a majority of nodes have acknowledged the write, but not that a majority have written it to disk. You can also guarantee that the primary node has written it to disk.

    Going through the scenario of a three node replica set where you set j=1 (journaled at the primary) and w=majority and wait for the majority ack before considering the write to be persistent:

    • If primary node fails and you have received an acknowledgement, the write is on the primary's disk and on failover, the furthest forward secondary, who has also seen your write (we know a majority have seen it), will become primary. The secondary may not have yet written the write to disk at the moment of failure, but will soon. We assumed only single node failure so have implicitly assumed that the secondary does not fail. Your write persists.
    • If a secondary node fails, no election will occur. The primary won't change. Your write persists