Search code examples
databasemongodbperformancequery-performance

MongoDB - pipeline grouping by unique set?


I have a data set that looks something like this:

{
   "model":"1234",
   "trackingNumber":"9123..."
}

These models values are shared by very many records, and I'm trying to find which combinations represent the most common groupings.

I have a pipeline that gives me what I want, but order matters there. For example:

{
   "_id":["1234","4321"]
   "count":69761
},
{
   "_id":["4321","1234"]
   "count":44321
},
...

Is there any way to use _id to group by unique arrays, regardless of order? The only solution I have thought of so far is sort by the model and hoping that preserve $addToSet operation order, but that both feels hacky and feels low performance and I don't suspect order of operations is guaranteed anyways.

My pipeline:

db.trackingIds.aggregate([
      
     {
        $match: { $and: [
                ...
           ] }
    },
    {
        $group: {
            _id : "$trackingNumber",
            models: {$addToSet: "$model"},
            count : { $sum: 1 }
        }
    },
    {
        $group: {
            _id : "$models",
            count : { $sum: 1 }
        }
    },
    {
        $sort: {
            count:  -1
        }
    }
])

Solution

  • You might $unwind the array, sort by the unwound values, then $group by _id and $push the elements back into an array.

    That way the array is sorted, so the various equivalent sets should test equal.