Search code examples
mongodbmongodb-querymongodb-indexesquery-planner

Mongo doesn't optimize $or query by combining two IXSCANs


I have an orders collection with the following index, among others:

{location: 1, completedDate: 1, estimatedProductionDate: 1, estimatedCompletionDate: 1}

I'm performing the following query:

db.orders.find({
  status: {$in: [1, 2, 3]},
  location: "PA",
  $or: [
    {completedDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}},
    {
      completedDate: null,
      estimatedProductionDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}
    }
  ]
}).explain()

I was hoping this would perform an efficient IXSCAN for each branch of the $or, and then combine the results:

        {completedDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}}

        "indexBounds" : {
            "location" : [
                "[\"TX\", \"TX\"]"
            ],
            "completedDate" : [
                "[MinKey, ISODate("2017-08-22T04:59:59.999Z")]"
            ],
            "estimatedProductionDate" : [
                "[MinKey, MaxKey]"
            ],
            "estimatedCompletionDate" : [
                "[MinKey, MaxKey]"
            ]
        }

        {
            completedDate: null,
            estimatedProductionDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}
        }

        "indexBounds" : {
            "location" : [
                "[\"TX\", \"TX\"]"
            ],
            "completedDate" : [
                "[null, null]"
            ],
            "estimatedProductionDate" : [
                "[MinKey, ISODate("2017-08-22T04:59:59.999Z")]"
            ],
            "estimatedCompletionDate" : [
                "[MinKey, MaxKey]"
            ]
        }

Instead, it only bounds the location in the IXSCAN, and does the rest of the filtering during FETCH. Is there any way to optimize this query without splitting it into two separate queries?

"winningPlan" : {
    "stage" : "FETCH",
    "filter" : {
        "$and" : [
            {
                "$or" : [
                    {
                        "$and" : [
                            {
                                "completedDate" : {
                                    "$eq" : null
                                }
                            },
                            {
                                "estimatedProductionDate" : {
                                    "$lt" : "2017-08-22T04:59:59.999Z"
                                }
                            }
                        ]
                    },
                    {
                        "completedDate" : {
                            "$lt" : "2017-08-22T04:59:59.999Z"
                        }
                    }
                ]
            },
            {
                "status" : {
                    "$in" : [
                        1,
                        2,
                        3
                    ]
                }
            }
        ]
    },
    "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
            "location" : 1,
            "completedDate" : 1,
            "estimatedProductionDate" : 1,
            "estimatedCompletionDate" : 1
        },
        "indexName" : "location_1_completedDate_1_estimatedProductionDate_1_estimatedCompletionDate_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
            "location" : [
                "[\"TX\", \"TX\"]"
            ],
            "completedDate" : [
                "[MinKey, MaxKey]"
            ],
            "estimatedProductionDate" : [
                "[MinKey, MaxKey]"
            ],
            "estimatedCompletionDate" : [
                "[MinKey, MaxKey]"
            ]
        }
    }
},

Solution

  • There are three issues that are immediately apparent:

    Your index

    I'm not sure about the other indexes you have, but your query is of the shape:

    {
      status:1,
      location:1,
      $or: [
        {completedDate:1},
        {completedDate:1, estimatedProductionDate:1}
      ]
    }
    

    However your index does not contain the term status. You would need the status field in your index to maximize index use.

    Your $or query

    To paraphrase the page $or Clauses and Indexes:

    ... for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.

    To put it simply, efficient $or queries in MongoDB would require the $or term to be the top-level term, with each part of the term supported by an index.

    For example, you may find the performance of the following index and query to be a bit better:

    db.orders.createIndex({
      status:1,
      location:1,
      completedDate:1,
      estimatedProductionDate:1
    })
    
    db.orders.explain().find({
      $or: [
        {
          status: {$in: [1, 2, 3]},
          location: "PA",
          completedDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}},
        {
          status: {$in: [1, 2, 3]},
          location: "PA",
          completedDate: null,
          estimatedProductionDate: {$lt: ISODate("2017-08-22T04:59:59.999Z")}
        }
      ]
    })
    

    The reason is because MongoDB treats each of the term in an $or query to be a separate query. Thus, each term can use its own index.

    Note that the order of fields in the index I proposed above follows the order of the fields in the query.

    However, this is still not optimal, because MongoDB has to perform a fetch with filter: {completedDate: {$eq: null}} after the index scan for a query with completedDate: null. The reason for this is subtle and best explained here:

    1. The document {} generates the index key {"": null} for the index with key pattern {"a.b": 1}.
    2. The document {a: []} also generates the index key {"": null} for the index with key pattern {"a.b": 1}.
    3. The document {} matches the query {"a.b": null}.
    4. The document {a: []} does not match the query {"a.b": null}.

    Therefore, a query {"a.b": null} that is answered by an index with key pattern {"a.b": 1} must fetch the document and re-check the predicate, in order to ensure that the document {} is included in the result set and that the document {a: []} is not included in the result set.

    To maximize index use, you may be better off just to assign something into the completedDate field instead of setting it to null.