Search code examples
node.jsmongodbexpressmongooseangular15

How to filter with pagination efficiently with millions of records in mongodb?


I know there are a LOT of questions regarding this subject. And while most work, they are really poor in performance when there are millions of records.

I have a collection with 10,000,000 records.

At first I was using mongoose paginator v2 and it took around 8s to get each page, with no filtering and 25s when filtering. Fairly decent compared to the other answers I found googling around. Then I read about aggregate (in some question about the same here) and it was a marvel, 7 ms to get each page without filtering, no matter what page it is:

  const pageSize = +req.query.pagesize;
  const currentPage = +req.query.currentpage;

  let recordCount;
  ServiceClass.find().count().then((count) =>{
    recordCount = count;
    ServiceClass.aggregate().skip(currentPage).limit(pageSize).exec().then((documents) => {
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents,
        count: recordCount,
      });
    })
    .catch((error) => {
      res.status(500).json({ message: msgGettingRecordsError });
    });
  }).catch((error) => {
    res.status(500).json({ message: "Error getting record count" });
  });

What I'm having issues with is when filtering. aggregate doesn't really work like find so my conditions are not working. I read the docs about aggregate and tried with [ {$match: {description: {$regex: regex}}} ] inside aggregate as a start but it did not return anything. This is my current working function for filtering and pagination (which takes 25s):

  const pageSize = +req.query.pagesize;
  const currentPage = +req.query.currentpage;

  const filter = req.params.filter;
  const regex = new RegExp(filter, 'i');

  ServiceClass.paginate({
    $or:[
      {code: { $regex: regex }},
      {description: { $regex: regex }},
    ]
  },{limit: pageSize, page: currentPage}).then((documents)=>{
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents
      });
    }).catch((error) => {
    res.status(500).json({ message: "Error getting the records." });
  });

code and description are both indexes. code is a unique index and description is just a normal index. I need to search for documents which contains a string either in code or description field.

What is the most efficient way to filter and paginate when you have millions of records?


Solution

  • Below code will get the paginated result from the database along with the count of total documents for that particular query simultaneously.

    const pageSize = +req.query.pagesize;
    const currentPage = +req.query.currentpage;
    const skip = currentPage * pageSize - pageSize;
    const query = [
        {
          $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] },
        },
        {
          $facet: {
            result: [
              {
                $skip: skip,
              },
              {
                $limit: pageSize,
              },
              {
                $project: {
                  createdAt: 0,
                  updatedAt: 0,
                  __v: 0,
                },
              },
            ],
            count: [
              {
                $count: "count",
              },
            ],
          },
        },
        {
          $project: {
            result: 1,
            count: {
              $arrayElemAt: ["$count", 0],
            },
          },
        },
      ];
    const result = await ServiceClass.aggregate(query);
    console.log(result)
    // result is an object with result and count key.
    

    Hope it helps.