So I have a question how to form a specific slightly more complicated query to the db. I have a collection with documents something like:
[{
"_id": { "$oid" : "Object1" },
"created": { "$date" : "2021-11-14T10:58:01.456Z" },
"primaryId": "SomeId1",
"secondaryIdx": 0,
"otherData" : something
}
{
"_id": { "$oid" : "Object2" },
"created": { "$date" : "2021-11-13T10:58:01.456Z" },
"primaryId": "SomeId2",
"secondaryIdx": 0,
"otherData" : something
}
{
"_id": { "$oid" : "Object3" },
"created": { "$date" : "2021-11-15T10:58:01.456Z" },
"primaryId": "SomeId2",
"secondaryIdx": 1,
"otherData" : something
}
{
"_id": { "$oid" : "Object4" },
"created": { "$date" : "2021-11-16T10:58:01.456Z" },
"primaryId": "SomeId1",
"secondaryIdx": 0,
"otherData" : something
}]
And my resulting collection should have only documents with unique combinations of primaryId + secondaryIdx. Wherever there is more then one document with this combination, I want to get only the newest one. So my result should look like this:
{
"_id": { "$oid" : "Object2" },
"created": { "$date" : "2021-11-13T10:58:01.456Z" },
"primaryId": "SomeId2",
"secondaryIdx": 0,
"otherData" : something
}
{
"_id": { "$oid" : "Object3" },
"created": { "$date" : "2021-11-15T10:58:01.456Z" },
"primaryId": "SomeId2",
"secondaryIdx": 1,
"otherData" : something
}
{
"_id": { "$oid" : "Object4" },
"created": { "$date" : "2021-11-16T10:58:01.456Z" },
"primaryId": "SomeId1",
"secondaryIdx": 0,
"otherData" : something
}]
So the Object1 gets excluded from the result because the primaryId+secondaryIdx is duplicate and Object4 is newer.
I currently have this implemented in my application but I think it would be better the do this at a query level, so my application doesn't have to load unneccesarily big collections and then have to potentially filter out a large portion of them immediately anyway.
You can first $sort
by primaryId: 1, secondaryIdx: 1, created: -1
. Then do a $group
by the primaryId + secondaryIdx
then take the first document.
db.collection.aggregate([
{
$sort: {
primaryId: 1,
secondaryIdx: 1,
created: -1
}
},
{
$group: {
_id: {
primaryId: "$primaryId",
secondaryIdx: "$secondaryIdx"
},
lastDoc: {
$first: "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$lastDoc"
}
}
])
Here is the Mongo playground for your reference.