Search code examples
mongodbsumaggregation

Perform $group and count in mongoDB aggregation


Given that I have a complex grouping requirement, I was wondering what would be the best approach to achieving my desired result. My data (result of $project stage) would look something like this:

{
_id:$id
status:"available"
inspectionStatus:"done"
state:"completed"
category:"One"
},
{
_id:$id
status:"booked"
inspectionStatus:"none"
state:"active"
category:"Two"
},
.
.
.

I have tried using $facet to create multiple buckets since the grouping I am trying to create are aggregations of $status + $state + $inspection, but the execution time is way unacceptable, taking something around 1639763842 milliseconds. I can't use use $accumulator because of mongoDB version (although we can always upgrade to 4.4.x) but I am not sure whether using $accumulator would produce a better response time.

The $facet stage is included:

{
  "available": [
    {"$match": 
      {$and: [
        {"status": "available"},
        {"inspectionStatus": "done"}
      ]}
    }
  ],
  "matched": [
    {"$match": 
      {$and: [
        {"status": "booked"},
        {"state": "booked"}
        ]
      }
    }
  ],
  "inIntake": [ 
    {"$match": 
      {$and: [
        {"status": "available"},
        {"inspectionStatus": {$ne: "done"}}
        ]
      }
    }
  ],
  "active": [
    {"$match": 
      {$and: [
        {"status": "booked"},
        {"state": "active"}
        ]
      }
    }
  ],
  "unreturned":[
    {"$match": 
      {"status": "forceCompleted"}
    }
  ]
}

Solution

  • If you really want to push the logic to the DB, here's a solution -- but you still have to examine the XX field doc by doc:

    db.foo.aggregate([
        {$addFields: {XX: {$switch: {
        branches: [
                { case: {
            $and: [{$eq:["$status","available"]},{$eq:["$inspectionStatus","done"]}]
                }, then:'AVAILABLE' },
                { case: {
            $and: [{$eq:["$status","booked"]},{$eq:["$state","booked"]}]
                }, then:'MATCHED' },
                { case: {
            $and: [{$eq:["$status","available"]},{$ne:["$inspectionStatus","done"]}]
                }, then:'IN_INTAKE' },
                { case: {
            $and: [{$eq:["$status","booked"]},{$eq:["$state","active"]}]
                }, then:'ACTIVE' },
                { case: {
            $eq:["$status","forceCompleted"]
                }, then:'UNRETURNED' },
        ],
            default: null
        }}
                     }}
        ,{$match: {XX: {$ne: null}}}
    ]);
    

    The end-to-end timing on this is actually a bunch of millis better than simple find() because less material is transferred but of course the DB engine is working a little harder processing the pipeline.