Search code examples
arraysmongodbspring-datamongodb-queryspring-mongo

Search MongoDB Array of Objects where property has same value for more than 1 array element


I have a Mongo Collection of Products with a Categories field that is an array of objects.

{
    "_id" : ObjectId("XXX"),
    "title" : "Cool Product",
    "details" : "Some Details",
    "categories" : [ 
        {
            "name" : "Cat1",
            "isPrimary" : true
        }, 
        {
            "isPrimary" : true,
            "name" : "Cat2"
        }, 
        {
            "name" : "Cat3"
        }
    ]
}

Because a product can have more than one category, I wanted to force a primary category relationship (one-to-one). However, in data migration some documents have the isPrimary property true for multiple categories in the document. I need to find the products where isPrimary is true for more than one array element in the categories array. This is what I have so far:

db.products.find({ "categories" : { "$elemMatch" : { "isPrimary" : { "$exists" : false}}}})

But this only gives me the results where isPrimary doesn't exist for one of the array elements. I can't figure out how to query isPrimary having the same value on multiple array elements. Also, this will be a Spring query as well:

Query query = new Query();
query.addCriteria(new Criteria().orOperator(
            Criteria.where("categories").elemMatch(Criteria.where("isPrimary").exists(false)),
            Criteria.where("categories").size(0),
            Criteria.where("categories")
            ));
query.with(new Sort(Sort.Direction.ASC, "title"));
return operations.find(query, Product.class);

Solution

  • You need to use aggregation pipeline here:

    db.products.aggregate([
     {$unwind:"$categories"},
     {$match:{"categories.isPrimary":true}},
     {$group:{_id:"$_id", numPrimaries:{$sum:1}}},
     {$match:{numPrimaries:{$gt:1}}}
    ])
    

    This will "unwind" the array of categories, keep only the ones that have isPrimary true, "wind" or group them back by original _id summing how many isPrimary values were true and then filter out the documents that had only one. You will be left with _id values of documents that had multiple categories with isPrimary true.