javascript mongodb mongodb-query aggregation-framework mongodb-aggregation

Trouble creating query in MongoDB with subquery

I have a dataset that looks something like this:

{
  "id": "02741544",
  "items": [{
    "item": "A"
  }]
}, {
  "id": "02472691",
  "items": [{
    "item": "A"
  }, {
    "item": "B"
  }, {
    "item": "C"
  }]
}, {

  "id": "01316523",
  "items": [{
    "item": "A"
  }, {
    "item": "B"
  }]
}, {
  "id": "01316526",
  "items": [{
    "item": "A"
  }, {
    "item": "B"
  }]
}, {
  "id": "01316529",
  "items": [{
    "item": "A"
  }, {
    "item": "D"
  }]
},

I'm trying to craft a query that will give me an output that looks like this:

{
  "item": "A",
  "ids": [{
    "id": "02741544"

  }, {
    "id": "02472691"

  }, {
    "id": "01316523"

  }, {
    "id": "01316526"

  }, {
    "id": "01316529"

  }]
}, {
  "item": "B",
  "ids": [{
    "id": "02472691"

  }, {
    "id": "01316523"

  }, {
    "id": "01316526"

  }]
}, {
  "item": "C",
  "ids": [{
    "id": "02472691"

  }]
}, {
  "item": "D",
  "ids": [{
    "id": "02472691"

  }]
},

Basically, I'm trying to get the distinct items from the item array in the object, and then returning an array of ids for each obj that has that item in it's item array.

Solution

Better use the aggregation framework in which you need to run an operation that consists of the following pipeline steps (in the given order):

$unwind - This initial step will flatten the items array i.e. it produces a copy of each document per array entry. This is necessary for processing the documents further down the pipeline as "denormalised" documents which you can aggregate as groups.
$group - This will group the flattened documents by the item subdocument key and create the ids list by using the $push accumulator operator.

-- UPDATE --

As @AminJ pointed out in the comments, if items can have duplicate item values and you don't want duplicate ids in the result you can use $addToSet instead of $push

The following example demonstrates this:

db.collection.aggregate([   
    { "$unwind": "$items" },
    {
        "$group": {
            "_id": "$items.item",
            "ids": { 
                "$push": { "id": "$id" } /* or use 
                "$addToSet": { "id": "$id" } if you don't want duplicate ids */                    
            }
        }
    }
])

Sample Output

{
    "_id" : "A",
    "ids" : [ 
        { "id" : "02741544" }, 
        { "id" : "02472691" }, 
        { "id" : "01316523" }, 
        { "id" : "01316526" }, 
        { "id" : "01316529" }
    ]
}

/* 2 */
{
    "_id" : "B",
    "ids" : [ 
        { "id" : "02472691" }, 
        { "id" : "01316523" }, 
        { "id" : "01316526" }
    ]
}

/* 3 */
{
    "_id" : "C",
    "ids" : [ 
        { "id" : "02472691" }
    ]
}

/* 4 */
{
    "_id" : "D",
    "ids" : [ 
        { "id" : "01316529" }
    ]
}

The result from an aggregate() function is a cursor to the documents produced by the final stage of the aggregation pipeline operation. So if you want the results in an array you can use the cursor's toArray() method which returns an array that contains all the documents from it.

For example:

var pipeline = [    
        { "$unwind": "$items" },
        {
            "$group": {
                "_id": "$items.item",
                "ids": { 
                    "$push": { "id": "$id" } /* or use 
                    "$addToSet": { "id": "$id" } if you don't want duplicate ids */                    
                }
            }
        }
    ],
    results = db.collection.aggregate(pipeline).toArray();

printjson(results);