Given that I have a complex grouping requirement, I was wondering what would be the best approach to achieving my desired result. My data (result of $project stage) would look something like this:
{
_id:$id
status:"available"
inspectionStatus:"done"
state:"completed"
category:"One"
},
{
_id:$id
status:"booked"
inspectionStatus:"none"
state:"active"
category:"Two"
},
.
.
.
I have tried using $facet
to create multiple buckets since the grouping I am trying to create are aggregations of $status
+ $state
+ $inspection
, but the execution time is way unacceptable, taking something around 1639763842
milliseconds.
I can't use use $accumulator
because of mongoDB version (although we can always upgrade to 4.4.x) but I am not sure whether using $accumulator
would produce a better response time.
The $facet
stage is included:
{
"available": [
{"$match":
{$and: [
{"status": "available"},
{"inspectionStatus": "done"}
]}
}
],
"matched": [
{"$match":
{$and: [
{"status": "booked"},
{"state": "booked"}
]
}
}
],
"inIntake": [
{"$match":
{$and: [
{"status": "available"},
{"inspectionStatus": {$ne: "done"}}
]
}
}
],
"active": [
{"$match":
{$and: [
{"status": "booked"},
{"state": "active"}
]
}
}
],
"unreturned":[
{"$match":
{"status": "forceCompleted"}
}
]
}
If you really want to push the logic to the DB, here's a solution -- but you still have to examine the XX
field doc by doc:
db.foo.aggregate([
{$addFields: {XX: {$switch: {
branches: [
{ case: {
$and: [{$eq:["$status","available"]},{$eq:["$inspectionStatus","done"]}]
}, then:'AVAILABLE' },
{ case: {
$and: [{$eq:["$status","booked"]},{$eq:["$state","booked"]}]
}, then:'MATCHED' },
{ case: {
$and: [{$eq:["$status","available"]},{$ne:["$inspectionStatus","done"]}]
}, then:'IN_INTAKE' },
{ case: {
$and: [{$eq:["$status","booked"]},{$eq:["$state","active"]}]
}, then:'ACTIVE' },
{ case: {
$eq:["$status","forceCompleted"]
}, then:'UNRETURNED' },
],
default: null
}}
}}
,{$match: {XX: {$ne: null}}}
]);
The end-to-end timing on this is actually a bunch of millis better than simple find()
because less material is transferred but of course the DB engine is working a little harder processing the pipeline.