Search code examples
mongodbmongodb-querypymongopymongo-3.x

What is the default ordering of Mongo distinct method?


My inventory collection is as follows:

{ "_id": 1, "dept": "A", "item": { "sku": "111", "color": "red" }, "sizes": [ "S", "M" ] }
{ "_id": 2, "dept": "A", "item": { "sku": "111", "color": "blue" }, "sizes": [ "M", "L" ] }
{ "_id": 3, "dept": "B", "item": { "sku": "222", "color": "blue" }, "sizes": "S" }
{ "_id": 4, "dept": "A", "item": { "sku": "333", "color": "black" }, "sizes": [ "S" ] }

Mongo documentation specifies distinct command as follows:

Finds the distinct values for a specified field across a single collection. distinct returns a document that contains an array of the distinct values.

On executing distinct on dept field and SKU fields returns as follows:

db.inventory.distinct( "dept" )

outputs: [ "A", "B" ]

db.inventory.distinct( "item.sku" )

ouputs: [ "111", "222", "333" ]

This clearly indicates distinct command ordering, which by default depends on the order by which documents are inserted into the collection.

But the distinct on array fields like sizes ideally (db.inventory.distinct( "sizes" )) should return ["S", "M", "L"] but in turn returns [ "M", "S", "L" ]

This behaviour is with default indexing added only on the _id field and no other field indexing.

Any insights into mongo's distinct implementation are helpful. Thanks in advance.

PS: My requirement is a collection A contains user metadata(userId, password, createdDate), I use distinct on A(userId field) and create documents in another collection B to keep daily snapshots of user-related data like orders. Typical B document would be like: { date: datetime, userData: [{user0's orders}, {user1's orders} ... ] }

At end of the month, I need aggregation of users0's order data. User metadata can keep adding new users and all users almost have daily order activity. So default ordering of mongo distinct matters here.


Solution

  • The order results are returned is not guaranteed, so you should assume it is arbitrary.

    If you need the results in a specific order, just add a .sort() to your query.