Search code examples
node.jsmongodbnode-mongodb-native

How to get documents with non unique array elements?


I have the following MongoDB documents:

{
   _id: ObjectId('09de14821345dda65c471c99'),
   items: [
        _id: ObjectId('34de64871345dfa655471c99'),
        _id: ObjectId('34de64871345dfa655471c91'),
        _id: ObjectId('34de64871345dfa655471c99'),       
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c98'),
   items: [
        _id: ObjectId('24de64871345dfa61271c10'),
        _id: ObjectId('24de64871345dfa61271c11'),
        _id: ObjectId('24de64871345dfa61271c11'),       
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c07'),
   items: [
        _id: ObjectId('24de64871345dfa61271c05'),
        _id: ObjectId('24de64871345dfa61271c06'),
        _id: ObjectId('24de64871345dfa61271c07'),       
   ]
}

I need to find all documents with repeated items array elements. So from the documents above I want to get the following result:

db.collection.documents.find({/** need query*/}).toArray(function (err, documents) {
    console.dir(documents); // documents with id's 09de14821345dda65c471c99 and 09de14821345dda65c471c98
});

How could I do that?


Solution

  • In order to group and match results you will need to use the Aggregation Framework or Map/Reduce rather than a simple find() query.

    Example data

    Your example document include some errors: a few of the ObjectIDs are too short and the array elements should either be embedded documents ({_id: ObjectId(...)}) or simple values.

    For test data I've used:

    db.mydocs.insert([
    {
       _id: ObjectId('09de14821345dda65c471c99'),
       items: [
            ObjectId('34de64871345dfa655471c99'),
            ObjectId('34de64871345dfa655471c91'),
            ObjectId('34de64871345dfa655471c99')      
       ]
    },
    {
       _id: ObjectId('09de14821345dda65c471c98'),
       items: [
            ObjectId('24de64871345ddfa61271c10'),
            ObjectId('24de64871345ddfa61271c11'),
            ObjectId('24de64871345ddfa61271c11')       
       ]
    },
    {
       _id: ObjectId('09de14821345dda65c471c07'),
       items: [
            ObjectId('24de64871345ddfa61271c05'),
            ObjectId('24de64871345ddfa61271c06'),
            ObjectId('24de64871345ddfa61271c07')       
       ]
    }])
    

    Aggregation query

    Here is an aggregation query using the mongo shell:

    db.mydocs.aggregate(
    
        // Unpack items array into stream of documents
        { $unwind: "$items" },
    
        // Group by original document _id and item
        { $group: {
            _id: { _id: "$_id", item: "$items" },
            count: { $sum: 1 }
        }},
    
        // Limit to duplicated array items (1 or more count per document _id)
        { $match: {
            count: { $gt: 1 }
        }},
    
        // (Optional) clean up the result formatting
        { $project: {
            _id: "$_id._id",
            item: "$_id.item",
            count: "$count"
        }}
    )
    

    Sample results

    {
        "_id" : ObjectId("09de14821345dda65c471c98"),
        "count" : 2,
        "item" : ObjectId("24de64871345ddfa61271c11")
    }
    {
        "_id" : ObjectId("09de14821345dda65c471c99"),
        "count" : 2,
        "item" : ObjectId("34de64871345dfa655471c99")
    }