Search code examples
mongodbperformancetimeout

Mongodb - group() can't handle more than 20000 unique keys


I am running a groupBy query on my mongo cluster. I am using online SQL to mongo query converter tool. After running the query, I am getting an error. the data size is big. is there a way around?

// SQL  query
SELECT COUNT(*) AS count FROM db_name
WHERE version="v2"
GROUP BY id

// Mongo query 
db.db_name.group({
"key": {
    "id": true
},
"initial": {
    "count": 0
},
"reduce": function(obj, prev) {
    if (true != null) if (true instanceof Array) prev.count += true.length;
    else prev.count++;
},
"cond": {
    "version": "v2"
}
});

and I am getting this error

 E QUERY    [js] Error: group command failed: {
"operationTime" : Timestamp(1589898357, 1),
"ok" : 0,
"errmsg" : "Plan executor error during group command :: caused by :: group() can't handle more than 20000 unique keys",
"code" : 2,
"codeName" : "BadValue",
"$clusterTime" : {
    "clusterTime" : Timestamp(1589898357, 1),
    "signature" : {
        "hash" : BinData(0,"SvsjmAIsn4rGwA/aRtLt3MPenJQ="),
        "keyId" : NumberLong("6784431306852794369")
    }
}
} :

Solution

  • You can use db.collection.aggregate() instead like:

    db.db_name.aggregate([
        { $match: { version: "v2" } },
        { $group: { _id: "$id", count: { $sum: 1 } } },
        { $project: { _id: 0 } }
    ])
    

    The above aggregation operation selects documents with version equal to "v2", groups the matching documents by the id field and calculates the COUNT(*) for each id field.

    As mentioned in docs:

    The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.

    The aggregation pipeline can use indexes to improve its performance during some of its stages. In addition, the aggregation pipeline has an internal optimization phase.