Search code examples
mongodbmongodb-queryaggregation-frameworkspring-data-mongodb

MongoDB: Get newest documents with distinct combination of two values


So I have a question how to form a specific slightly more complicated query to the db. I have a collection with documents something like:

[{
  "_id": { "$oid" : "Object1" },
  "created": { "$date" : "2021-11-14T10:58:01.456Z" },
  "primaryId": "SomeId1",
  "secondaryIdx": 0,
  "otherData" : something
}
{
  "_id": { "$oid" : "Object2" },
  "created": { "$date" : "2021-11-13T10:58:01.456Z" },
  "primaryId": "SomeId2",
  "secondaryIdx": 0,
  "otherData" : something
}
{
  "_id": { "$oid" : "Object3" },
  "created": { "$date" : "2021-11-15T10:58:01.456Z" },
  "primaryId": "SomeId2",
  "secondaryIdx": 1,
  "otherData" : something
}
{
  "_id": { "$oid" : "Object4" },
  "created": { "$date" : "2021-11-16T10:58:01.456Z" },
  "primaryId": "SomeId1",
  "secondaryIdx": 0,
  "otherData" : something
}]

And my resulting collection should have only documents with unique combinations of primaryId + secondaryIdx. Wherever there is more then one document with this combination, I want to get only the newest one. So my result should look like this:

{
  "_id": { "$oid" : "Object2" },
  "created": { "$date" : "2021-11-13T10:58:01.456Z" },
  "primaryId": "SomeId2",
  "secondaryIdx": 0,
  "otherData" : something
}
{
  "_id": { "$oid" : "Object3" },
  "created": { "$date" : "2021-11-15T10:58:01.456Z" },
  "primaryId": "SomeId2",
  "secondaryIdx": 1,
  "otherData" : something
}
{
  "_id": { "$oid" : "Object4" },
  "created": { "$date" : "2021-11-16T10:58:01.456Z" },
  "primaryId": "SomeId1",
  "secondaryIdx": 0,
  "otherData" : something
}]

So the Object1 gets excluded from the result because the primaryId+secondaryIdx is duplicate and Object4 is newer.

I currently have this implemented in my application but I think it would be better the do this at a query level, so my application doesn't have to load unneccesarily big collections and then have to potentially filter out a large portion of them immediately anyway.


Solution

  • You can first $sort by primaryId: 1, secondaryIdx: 1, created: -1. Then do a $group by the primaryId + secondaryIdx then take the first document.

    db.collection.aggregate([
      {
        $sort: {
          primaryId: 1,
          secondaryIdx: 1,
          created: -1
        }
      },
      {
        $group: {
          _id: {
            primaryId: "$primaryId",
            secondaryIdx: "$secondaryIdx"
          },
          lastDoc: {
            $first: "$$ROOT"
          }
        }
      },
      {
        "$replaceRoot": {
          "newRoot": "$lastDoc"
        }
      }
    ])
    

    Here is the Mongo playground for your reference.