I have a database of about 50k "company" records.
I want to find duplicates by matching:
name
and street
fields.OR
phone
field(I consider both #1 and #2 unique identifiers, so either can be used to find duplicates.)
I am able to write the $group statement to match based on #1:
_id: {
name: '$name',
street: 'street'
},
uniqueIds: {
$addToSet: '$_id'
},
count: {
$sum: 1
}
I tried something like this to match one or the other:
_id: {
$or: [
{name: '$name', street: '$street'},
{phone: '$phone}
]
}...
But that just returns a boolean.
How to group by filtering for #1 or #2 above in the same aggregation?
One option is to use $facet:
db.company.aggregate([
{ $facet:{
by_name_street:[ {$group:{ _id:{n:"$name",str:"$street" }, cnt:{$sum:1} }} ] ,
by_phone:[ {$group:{ _id:"$phone" , cnt:{$sum:1} }} ]
} }
])