Search code examples
mongodbmongodb-querymongodb-.net-driver

Using MongoCollection.FindAll returns the same document twice


We've run into a bug in our system caused by getting duplicate documents from MongoDB. This happens when using FindAll while updating the collection from another process at the same time.

What is the best practice to avoid this?

We don't mind getting a stale version of the just-updated document, getting just-deleted documents, nor missing just-inserted documents.

We've seen that there is a SetSnapshot option for cursors ($snapshot: true), but what are the performance implications? Why isn't it on by default?

We can remove the returned duplicates manually but that doesn't seem right and would also be a performance hit.


Update:

From our understanding, updates that change the document size may move its location in the collection. If this kind of update happens during the FindAll operation when the $snapshot option is off, the document can be returned twice.


Update 2:

Removing duplicates manually at the client-side (in memory) is not an option since some documents may be lost (for the same reasons they can appear twice).


Solution

  • There's a FAQ entry for this which basically says that you can prevent duplicate docs appearing in your query by either:

    • Using $snapshot, but this can't be used with sharded collections or sort()
    • Sorting your query on a field that doesn't change and has a unique index

    So use $snapshot if its restrictions don't create problems for you, or sort your results on static, uniquely indexed field like _id, using hint() if necessary to ensure it's being used. Do some profiling with explain() to test out your options.

    $snapshot isn't enabled by default because of its restrictions.