Search code examples
mongodbmongoosemlab

(How) can aggregate() break an index?


(How) can this code break an index on a MongoDB collection?

db.users.aggregate([
   { $group: { _id: null, total_orders: { $sum: { $size: "$orders" } } } },
])

This is (obviously) trying to count all the orders of all the users.

I have inherited a project with this code commented out. I need to uncomment (re-enable) the code, but would like to understand what I risk by doing so. The author of this change is unavailable, and their commit message for the commenting out is "remove statistics calls - breaking index."

The MongoDB we're using is via mlab.com, and hosted on AWS. We're connecting to mlab from a JS app using mongoose.


Solution

  • I suspect that the issue wasn't that the aggregation would break an index, but instead that the aggregation did not use indexes and would perform a collection scan.

    Aggregations can take advantage of indexes when there are $match and/or $sort stages placed at the beginning of a pipeline. This aggregation is only a single $group stage, meaning the entire collection would need to be iterated through to compute the count.

    I put a simple example below showing the aggregation performing a collection scan, even when the array field is indexed.

    > db.foo.insert({ "x" : [ 1, 2 ] } )
    > db.foo.insert({ "x" : [ 1 ] } )
    > db.foo.createIndex({ "x" : 1 } )
    ...
    
    > db.foo.aggregate([ { $group: { _id: null, cnt: { $sum : { $size: "$x" } } } } ] )
    { "_id" : null, "cnt" : 3 }
    
    // Results of a .explain() - see 'winningPlan' below
    > db.foo.explain(true).aggregate([ { $group: { _id: null, cnt: { $sum : { $size: "$x" } } } } ] )
    {
        "stages" : [
            {
                "$cursor" : {
                    "query" : {
    
                    },
                    "fields" : {
                        "x" : 1,
                        "_id" : 0
                    },
                    "queryPlanner" : {
                        "plannerVersion" : 1,
                        "namespace" : "stack.foo",
                        "indexFilterSet" : false,
                        "parsedQuery" : {
    
                        },
                        "winningPlan" : {
                            "stage" : "COLLSCAN",
                            "direction" : "forward"
                        },
                        "rejectedPlans" : [ ]
                    },
                    "executionStats" : {
                        "executionSuccess" : true,
                        "nReturned" : 2,
                        "executionTimeMillis" : 0,
                        "totalKeysExamined" : 0,
                        "totalDocsExamined" : 2,
                        "executionStages" : {
                            "stage" : "COLLSCAN",
                            "nReturned" : 2,
                            "executionTimeMillisEstimate" : 0,
                            "works" : 4,
                            "advanced" : 2,
                            "needTime" : 1,
                            "needYield" : 0,
                            "saveState" : 1,
                            "restoreState" : 1,
                            "isEOF" : 1,
                            "invalidates" : 0,
                            "direction" : "forward",
                            "docsExamined" : 2
                        },
                        "allPlansExecution" : [ ]
                    }
                }
            },
            {
                "$group" : {
                    "_id" : {
                        "$const" : null
                    },
                    "cnt" : {
                        "$sum" : {
                            "$size" : [
                                "$x"
                            ]
                        }
                    }
                }
            }
        ],
        "ok" : 1,
        ...
    }