Search code examples
mongodbdatemongodb-queryaggregation-frameworkquery-optimization

How to speed up aggregation query including date string comparisons in array without using $unwind?


I have a collection which contains documents with nested arrays. There are 1 million documents with thousands of objects in subfield array. Documents are quite large, but for purpose of explaining consider the following two documents:

[
  {
    "id": "myid",
    "field": {
      "total": 1,
      "subfield": [
        {
          "somefield": "1000",
          "time": "2020-08-06T08:33:57.977+0530",
          "val": [
            {
              "x": "someval",
              "a": "val1",
              "b": "val2"
            }
          ]
        },
        {
          "somefield": "2000",
          "time": "2020-05-08T04:13:27.977+0530",
          "val": [
            {
              "x": "someval2",
              "a": "val1",
              "b": "val2"
            }
          ]
        }
      ]
    }
  },
  {
    "id": "myid2",
    "field": {
      "total": 1,
      "subfield": [
        {
          "somefield": "1001",
          "time": "2020-07-31T10:15:50.184+0530",
          "val": [
            {
              "x": "someval2",
              "a": "val1",
              "b": "val2"
            },
            {
              "x": "someval2",
              "a": "val1",
              "b": "val2"
            }
          ]
        }
      ]
    }
  }
]

Use case:

I need to project only the id of documents with time date(grouped by date) greater than a value and field. subfield.val.b or field. subfield.val.a with a particular value.

I have the query to achieve my use case using $unwind,$toDate,$dateToString operators.

But the use of $unwind for large arrays causes the overall set to use a lot of memory and slows things down. It takes more than 15 minutes now.(I have not added any indexes because even if I create indexes for created, when I run explain for aggregation, the winning query does not use the provided index)

My current query:

db.collection.aggregate([
  {
    $unwind: {
      path: "$field.subfield",
      
    }
  },
  {
    $unwind: {
      path: "$field.subfield.val",
      
    }
  },
  {
    $addFields: {
      created_at: {
        $toDate: "$field.subfield.time"
      }
    }
  },
  {
    $match: {
      $and: [
        {
          $expr: {
            $gt: [
              {
                "$dateToString": {
                  "date": "$created_at",
                  "format": "%Y-%m-%d"
                }
              },
              "2020-04-28"
            ]
          }
        },
        {
          $or: [
            {
              "field.subfield.val.a": {
                "$eq": "val1"
              }
            },
            {
              "field.subfield.val.b": {
                "$eq": "val1"
              }
            }
          ]
        }
      ]
    }
  },
  {
    $group: {
      _id: "$id"
    }
  }
])

Query in MongoDB Playground

I need to limit the execution time of the query to less than 30 seconds. I hope the process can be made faster if done without $unwind.

My MongoDB Server version is 4.0.3

What are the other possible optimisations that can be done?

Thanks!


Solution

  • Possible optimisations that can be done:

    1. Since $unwind slows down the query, $filter can be used to get matching results from nested array.
    2. Date string comparisons are heavy, So its better to store the dates as MongoDB date object and do all the comparisons with date type itself instead of date string type.
    db.collection.aggregate([
      {
        "$project": {
          "obj1": {
            "$filter": {
              "input": "$field.subfield",
              "as": "el",
              "cond": {
                "$and": [
                  {
                    "$gt": [
                      "$$el.time",
                      new Date("2020-04-29")
                    ]
                  }
                ]
              }
            }
          },
          id: 1,
          
        }
      },
      {
        $match: {
          $or: [
            {
              "obj1.val.a": {
                "$eq": "val1"
              }
            },
            {
              "obj1.val.b": {
                "$eq": "val1"
              }
            }
          ]
        }
      },
      {
        $project: {
          id: 1,
          _id: 0
        }
      }
    ])
    

    Playground Example