I have a data set that looks something like this:
{
"model":"1234",
"trackingNumber":"9123..."
}
These models values are shared by very many records, and I'm trying to find which combinations represent the most common groupings.
I have a pipeline that gives me what I want, but order matters there. For example:
{
"_id":["1234","4321"]
"count":69761
},
{
"_id":["4321","1234"]
"count":44321
},
...
Is there any way to use _id to group by unique arrays, regardless of order?
The only solution I have thought of so far is sort by the model and hoping that preserve $addToSet
operation order, but that both feels hacky and feels low performance and I don't suspect order of operations is guaranteed anyways.
My pipeline:
db.trackingIds.aggregate([
{
$match: { $and: [
...
] }
},
{
$group: {
_id : "$trackingNumber",
models: {$addToSet: "$model"},
count : { $sum: 1 }
}
},
{
$group: {
_id : "$models",
count : { $sum: 1 }
}
},
{
$sort: {
count: -1
}
}
])
You might $unwind
the array, sort by the unwound values, then $group
by _id
and $push
the elements back into an array.
That way the array is sorted, so the various equivalent sets should test equal.