I am using a Mongo aggregated framework, suppose if I am having collection structure like this
{
{
_id: ObjectId(123)
name: john,
sessionDuration: 29
},
{
_id: ObjectId(456)
name: moore,
sessionDuration: 45
},
{
_id: ObjectId(789)
name: john,
sessionDuration: 25
},
{
_id: ObjectId(910)
name: john,
sessionDuration: 45
},
etc...
}
user with the same name
is the one who is using different sessions like in the following example: John is using service from three device with 3 sessions
durations are: 2 less than 30 (29,25) and 1 less than 50(45).
I want to do a bucket query for boundaries [0,30,50] but in the range it must only count the user with a unique names, no same name user with less than 30 or 50 duration count more than one, means the result should look like this
{
time: Unique_Name_Users_Only_Lies_In_This_Boundary,
‘30’: 1,
‘50’: 2,
}
so john was having 2 sessions less than 30 duration so we only need 1 from these two.
What I tried: I group all the docs first with unique name only, then apply bucket. but this approach will also skip the john with 45 sessionDuration.
How can I only get the unique name document count in a particular duration of $bucket boundary?
One option is to use the $bucket
with $addToSet
and then use $group
with $arrayToObject
to get your formatting:
db.collection.aggregate([
{$bucket: {
groupBy: "$sessionDuration",
boundaries: [0, 30, 50],
default: "Other",
output: {res: {$addToSet: "$name"}}
}},
{$group: {
_id: 0,
res: {$push: {k: {$toString: "$_id"}, v: {$size: "$res"}}}
}},
{$replaceRoot: {newRoot: {$arrayToObject: "$res"}}}
])
See how it works on the playground example
_id
of a bucket is its lower boundary. You can manipulate this if you really want, but I don't recommend it