Search code examples
mongodbnullmongodb-querymongodb-indexesquery-planner

Why Mongo query for null filters in FETCH after performing IXSCAN


According to Mongo Documentation,

The { item : null } query matches documents that either contain the item field whose value is null or that do not contain the item field.

I can't find documentation for this, but as far as I can tell, both cases (value is null or field is missing) are stored in the index as null.

So if I do db.orders.createIndex({item: 1}) and then db.orders.find({item: null}), I would expect an IXSCAN to find all documents that either contain the item field whose value is null or that do not contain the item field, and only those documents.

So then why does db.orders.find({item: null}).explain() perform filter: {item: {$eq: null}} in the FETCH stage after it performs an IXSCAN? What possible documents could need to be filtered out?

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "temp.orders",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "item" : {
                "$eq" : null
            }
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "item" : {
                    "$eq" : null
                }
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "item" : 1
                },
                "indexName" : "item_1",
                "isMultiKey" : false,
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 1,
                "direction" : "forward",
                "indexBounds" : {
                    "item" : [
                        "[null, null]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "Andys-MacBook-Pro-2.local",
        "port" : 27017,
        "version" : "3.2.8",
        "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0"
    },
    "ok" : 1
}

I thought maybe undefined values would get indexed as null, but simple experimentation rules this out:

> db.orders.createIndex({item: 1})
{
    "createdCollectionAutomatically" : true,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}
> db.orders.insert({item: undefined})
WriteResult({ "nInserted" : 1 })
> db.orders.find({item: {$type: 6}}).explain()
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "temp.orders",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "item" : {
                "$type" : 6
            }
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "item" : {
                    "$type" : 6
                }
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "item" : 1
                },
                "indexName" : "item_1",
                "isMultiKey" : false,
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 1,
                "direction" : "forward",
                "indexBounds" : {
                    "item" : [
                        "[undefined, undefined]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "Andys-MacBook-Pro-2.local",
        "port" : 27017,
        "version" : "3.2.8",
        "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0"
    },
    "ok" : 1
}

Solution

  • The semantics for a null equality match predicate (e.g. {"a.b": null}) are complicated enough because a field could contain subdocuments that an index scan alone isn't enough to provide the correct result.

    According to https://jira.mongodb.org/browse/SERVER-18653?focusedCommentId=931817&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-931817,

    Version 2.6.0 of the server changed the semantics of a null equality match predicate, such that the document {a: []} was no longer considered a match for the query predicate {"a.b": null} (in prior versions of the server, this document was considered a match for this predicate). This is documented in the 2.6 compatibility notes, under the "null comparison" section.

    For an index with key pattern {"a.b": 1}, this document {a: []} generates the index key {"": null}. Other documents like {a: null} and the empty document {} also generate the index key {"": null}. As a result, if a query with predicate {"a.b": null} uses this index, the query system cannot tell just from the index key {"": null} whether or not the associated document matches the predicate. As a result, INEXACT_FETCH bounds are assigned instead of EXACT bounds, and hence a FETCH stage is added to the query execution tree.

    Additional explanation:

    1. The document {} generates the index key {"": null} for the index with key pattern {"a.b": 1}.
    2. The document {a: []} also generates the index key {"": null} for the index with key pattern {"a.b": 1}.
    3. The document {} matches the query {"a.b": null}.
    4. The document {a: []} does not match the query {"a.b": null}.

    Therefore, a query {"a.b": null} that is answered by an index with key pattern {"a.b": 1} must fetch the document and re-check the predicate, in order to ensure that the document {} is included in the result set and that the document {a: []} is not included in the result set.