Search code examples
mongodbmongodb-querymongodb-indexes

Query performing faster without the index


Below is a simplified version of a document in my database:

{
    _id : 1,
    main_data : 100,
    sub_docs: [
        {
            _id : a,
            data : 22
        },
        {
            _id: b,
            data : 859
        },
        {
            _id: c,
            data: 151
        },

        ... snip ...

        {
           _id: m,
           data: 721
        },
        {
           _id: n,
           data: 111
        }
    ]
}

So imagine I have a million of these documents with varied data values (say 0 - 1000). Currently my query is something like:

db.myDb.find(
    { sub_docs: { $elemMatch: { data: { $gte: 110, $lt: 160 } } } }
)

Also say the query above will only match around 0.001% of the data (so around 10 documents are returned in total).

And I have an index set using:

db.myDb.ensureIndex( sub_docs.data )

Performing a timed test on this data seems to show it's quicker without any index set on sub_docs.data.

I'm using Mongo 3.2.8.

Edit - Additional information:

My timed test is a Perl script which queries the server and then pulls back the relevant data. I ran this test first when I had the index enabled, however the slow query times forced me to do a bit of digging. I wanted to see how bad the query times would get if I dropped the index, however it improved the response time of the query! I went a bit further, I plotted the query response time vs the total number of documents in the DB, both graphs show a linear increase in query time but the query with the index increases at a much faster rate. All the while through testing I've been keeping my eye on the server memory usage (which is low) as my first thought would have been the index doesn't fit in memory.

So overall my question is: why for this particular query does this query perform better without and index? And is there anyway to improve the speed of this query with a better index?

Update

Ok so it's been a while and I've narrowed it down to the index not constraining both sides of the query search parameters.

The query above will show an index bound of:

[-inf, 160]

Rather than 110 to 160. I can resolve this problem by using the index min and max functions as follows:

db.myDb.find(
    { sub_docs: { $elemMatch: { data: { $gte: 110, $lt: 160 } } } }
).min({'subdocs.data': 110}).max({'subdocs.data': 160})

However (if possible) I would prefer a different way of doing this as I would like to make use of the aggregate function (which doesn't seem to support min/max index functions)


Solution

  • Ok so I managed to sort this in the end. For whatever reason the index doesn't limit the query as I expected.

    Running this:

    db.myDb.find({ sub_docs: { $elemMatch: { data: { $gte: 110, $lt: 160 } } } }).explain()
    

    Snippet of what the index is doing is below:

                          "inputStage" : {
                                    "stage" : "IXSCAN",
                                    "keyPattern" : {
                                            "sub_docs.data" : 1
                                    },
                                    "indexName" : "sub_docs.data_1",
                                    "isMultiKey" : true,
                                    "isUnique" : false,
                                    "isSparse" : false,
                                    "isPartial" : false,
                                    "indexVersion" : 1,
                                    "direction" : "forward",
                                    "indexBounds" : {
                                            "sub_docs.data" : [
                                                    "[-inf.0, 160.0)"
                                            ]
                                    }
                            }
    

    Instead of limiting the index between 110 and 160 it's scanning all documents that match the index key of anything less than or equal to 160. I've not included it but the other rejected plan was an index scan of 110 to inf+. You can sort this issue with the min/max limits I mention above in my comment however this means you can't use the aggregation framework, which sucks.

    So the solution I found was to pull out all the data I wanted to index on into an array:

    {
        _id : 1,
        main_data : 100,
        index_values : [
            22,
            859,
            151,
    
          ...snip...
    
            721,
            111
        ],
        sub_docs: [
            {
                _id : a,
                data : 22
            },
            {
                _id: b,
                data : 859
            },
            {
                _id: c,
                data: 151
            },
    
            ... snip ...
    
            {
               _id: m,
               data: 721
            },
            {
               _id: n,
               data: 111
            }
        ]
    }
    

    And then I create the index:

    db.myDb.ensureIndex({index_values : 1})
    

    And then query on that instead:

    db.myDb.find({ index_values : { $elemMatch: { $gte: 110, $lt: 160 } } }).explain()
    

    Which produces:

    "indexBounds" : {
           "index_values" : [
               "[110.0, 160.0]"
           ]
    }
    

    So a lot less documents to check now!