Search code examples
mongodbmongodb-queryaggregation-frameworkmongodb-aggregation

How to use nested grouping in MongoDB


I need to find total count of duplicate profiles per organization level. I have documents as shown below:

{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "1"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "2"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "3"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "4"
}

I have written query which is a group by ProfileId and OrganizationId. The results i am getting as shown below:

Organization    Total
10               2
10               2

But i want to get the sum of total per organization level, that means Org 10 should have one row with sum of 4.

The query i am using as shown below:

 db.getSiblingDB("dbName").OrgProfile.aggregate(
 { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } },
 { $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} },
 { $match: { c: { $gt: 1 } } });

Any ideas ? Please help


Solution

  • The following pipeline should give you the desired output, whereas the last $project stage is just for cosmetic purposes to turn _id into OrganizationId but is not needed for the essential computation so you may omit it.

    db.getCollection('yourCollection').aggregate([
        { 
            $group: {  
                _id: { org: "$OrganizationId", profile: "$Profile._id" },
                count: { $sum: 1 }
            }
        },
        {
            $group: {
                _id: "$_id.org",
                Total: { 
                    $sum: { 
                        $cond: { 
                            if: { $gte: ["$count", 2] }, 
                            then: "$count", 
                            else: 0
                        }
                    }
                }
            } 
         },
         {
             $project: {
                 _id: 0,
                 Organization: "$_id",
                 Total: 1
             }
         }
    ])
    

    gives this output

    {
        "Total" : 4.0,
        "Organization" : 10
    }
    

    To filter out organizations without duplicates you can use $match which will also result in a simplification of the second $group stage

    ...aggregate([
        { 
            $group: {  
                _id: { org: "$OrganizationId", profile: "$Profile._id" },
                count: { $sum: 1 }
            }
        },
        {
            $match: {
                count: { $gte: 2 } 
            }
        },
        {
            $group: {
                _id: "$_id.org",
                Total: { $sum: "$count" }
            } 
         },
         {
             $project: {
                 _id: 0,
                 Organization: "$_id",
                 Total: 1
             }
         }
    ])