Search code examples
mongodbmongodb-querymongodb-indexes

Indexing MongoDB for sort consistency


The MongoDB documentation says that MongoDB doesn't store documents in a collection in a particular order. So if you have this collection:

db.restaurants.insertMany( [
   { "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
   { "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
   { "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
   { "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
   { "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
] );

and sorting like this:

db.restaurants.aggregate(
   [
     { $sort : { borough : 1 } }
   ]
)

Then the sort order can be inconsistent since:

the borough field contains duplicate values for both Manhattan and Brooklyn. Documents are returned in alphabetical order by borough, but the order of those documents with duplicate values for borough might not to be the same across multiple executions of the same sort.

To return a consistent result it's recommended to modify the query to:

db.restaurants.aggregate(
   [
     { $sort : { borough : 1, _id: 1 } }
   ]
)

My question relates to the efficiency of such a query. Let's say you have millions of documents, should you create a compound index, something like { borough: 1, _id: -1 }, to make it efficient? Or is it enough to index { borough: 1 } due to the, potentially, special nature of the _id field?

I'm using MongoDB 4.4.


Solution

  • If you need stable sort, you will have to sort on both the fields and for performant query you will need to have a compound index on both the fields.

    { borough: 1, _id: -1 }