I have a dataset that looks something like this:
{
"id": "02741544",
"items": [{
"item": "A"
}]
}, {
"id": "02472691",
"items": [{
"item": "A"
}, {
"item": "B"
}, {
"item": "C"
}]
}, {
"id": "01316523",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316526",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316529",
"items": [{
"item": "A"
}, {
"item": "D"
}]
},
I'm trying to craft a query that will give me an output that looks like this:
{
"item": "A",
"ids": [{
"id": "02741544"
}, {
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}, {
"id": "01316529"
}]
}, {
"item": "B",
"ids": [{
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}]
}, {
"item": "C",
"ids": [{
"id": "02472691"
}]
}, {
"item": "D",
"ids": [{
"id": "02472691"
}]
},
Basically, I'm trying to get the distinct items from the item array in the object, and then returning an array of ids for each obj that has that item in it's item array.
Better use the aggregation framework in which you need to run an operation that consists of the following pipeline steps (in the given order):
$unwind
- This initial step will flatten the items
array i.e. it produces a copy of each document per array entry. This is necessary for processing the documents further down the pipeline as "denormalised" documents which you can aggregate as groups.$group
- This will group the flattened documents by the item
subdocument key and create the ids
list by using the $push
accumulator operator. -- UPDATE --
As @AminJ pointed out in the comments, if items
can have duplicate item values and you don't want duplicate ids in the result you can use $addToSet
instead of $push
The following example demonstrates this:
db.collection.aggregate([
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
])
Sample Output
{
"_id" : "A",
"ids" : [
{ "id" : "02741544" },
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" },
{ "id" : "01316529" }
]
}
/* 2 */
{
"_id" : "B",
"ids" : [
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" }
]
}
/* 3 */
{
"_id" : "C",
"ids" : [
{ "id" : "02472691" }
]
}
/* 4 */
{
"_id" : "D",
"ids" : [
{ "id" : "01316529" }
]
}
The result from an aggregate()
function is a cursor to the documents produced by the final stage of the aggregation pipeline operation. So if you want the results in an array you can use the cursor's toArray()
method which returns an array that contains all the documents from it.
For example:
var pipeline = [
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
],
results = db.collection.aggregate(pipeline).toArray();
printjson(results);