I need to find total count of duplicate profiles per organization level. I have documents as shown below:
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "75"
}
"_id" : "1"
},
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "75"
}
"_id" : "2"
},
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "77"
}
"_id" : "3"
},
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "77"
}
"_id" : "4"
}
I have written query which is a group by ProfileId and OrganizationId. The results i am getting as shown below:
Organization Total
10 2
10 2
But i want to get the sum of total per organization level, that means Org 10 should have one row with sum of 4.
The query i am using as shown below:
db.getSiblingDB("dbName").OrgProfile.aggregate(
{ $project: { _id: 1, P: "$Profile._id", O: "$OrganizationId" } },
{ $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} },
{ $match: { c: { $gt: 1 } } });
Any ideas ? Please help
The following pipeline should give you the desired output, whereas the last $project
stage is just for cosmetic purposes to turn _id
into OrganizationId
but is not needed for the essential computation so you may omit it.
db.getCollection('yourCollection').aggregate([
{
$group: {
_id: { org: "$OrganizationId", profile: "$Profile._id" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.org",
Total: {
$sum: {
$cond: {
if: { $gte: ["$count", 2] },
then: "$count",
else: 0
}
}
}
}
},
{
$project: {
_id: 0,
Organization: "$_id",
Total: 1
}
}
])
gives this output
{
"Total" : 4.0,
"Organization" : 10
}
To filter out organizations without duplicates you can use $match
which will also result in a simplification of the second $group
stage
...aggregate([
{
$group: {
_id: { org: "$OrganizationId", profile: "$Profile._id" },
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{
$group: {
_id: "$_id.org",
Total: { $sum: "$count" }
}
},
{
$project: {
_id: 0,
Organization: "$_id",
Total: 1
}
}
])