I am working on a project that involves real-time analytics on a large dataset stored in a sharded MongoDB cluster. I need to use the aggregation pipeline to process this data efficiently, but I am experiencing performance issues. Specifically, I want to ensure that my queries run with minimal lag. Here is the pipeline I am using:
const pipeline = [
{ $match: { status: "active" } },
{ $group: { _id: "$category", total: { $sum: "$value" } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
];
db.collection.aggregate(pipeline, { allowDiskUse: true }).toArray((err, result) => {
if (err) throw err;
console.log(result);
});
I have set allowDiskUse to true in the aggregation options, which helps with memory usage, but I am still experiencing delays. I expected the aggregation pipeline to run more efficiently and provide results quickly. I also tried indexing the fields used in the pipeline but with limited success.
I am looking for best practices or optimizations that can help reduce the lag in my queries.
you can't perfomce query for using $match with only status: active. The output of this stage is too many if you have a big collection with that most documents are active. And, $sum is slow too with big result after first match stage