Search code examples
mongodbindexingquery-performance

MongoDB index going stale


Using mongo server v3.6.16.

I have a mongo collection with about 18m records. Records are being added at about 100k a day. I have a query that runs fairly often on the collection that depends on two values - user_id and server_time_stamp. I have a compound index set up for those two fields.

The index is regularly getting stale - and queries are taking minutes to complete and causing the server to burn all the CPU it can grab. As soon as I regenerate the index, queries happen quickly. But then a day or two later, the index is stale again. (ed. the index is failing more quickly now - within 30 mins.) I have no idea why the index is going stale - what can I look for?


Edit

Here are the index Fields:

{ 
   "uid" : 1, 
   "server_time_stamp" : -1
}

and index options:

{ 
    "v" : 2, 
    "name" : "server_time_stamp_1_uid_1", 
    "ns" : "sefaria.user_history"
}

This appears to be a Heisenbug. When I used "explain", it performs well. Here is one of the pathological queries, from the long query log, taking 445 seconds:

sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms

Here's the results of explain for a performant query, right after regenerating the index:

{ 
    "queryPlanner" : {
        "plannerVersion" : NumberInt(1), 
        "namespace" : "sefaria.user_history", 
        "indexFilterSet" : false, 
        "parsedQuery" : {
            "$and" : [
                {
                    "uid" : {
                        "$eq" : 80588.0
                    }
                }, 
                {
                    "server_time_stamp" : {
                        "$gt" : 1577918252.0
                    }
                }
            ]
        }, 
        "winningPlan" : {
            "stage" : "FETCH", 
            "inputStage" : {
                "stage" : "IXSCAN", 
                "keyPattern" : {
                    "uid" : NumberInt(1), 
                    "server_time_stamp" : NumberInt(-1)
                }, 
                "indexName" : "server_time_stamp_1_uid_1", 
                "isMultiKey" : false, 
                "multiKeyPaths" : {
                    "uid" : [

                    ], 
                    "server_time_stamp" : [

                    ]
                }, 
                "isUnique" : false, 
                "isSparse" : false, 
                "isPartial" : false, 
                "indexVersion" : NumberInt(2), 
                "direction" : "forward", 
                "indexBounds" : {
                    "uid" : [
                        "[80588.0, 80588.0]"
                    ], 
                    "server_time_stamp" : [
                        "[inf.0, 1577918252.0)"
                    ]
                }
            }
        }, 
        "rejectedPlans" : [
            {
                "stage" : "FETCH", 
                "filter" : {
                    "server_time_stamp" : {
                        "$gt" : 1577918252.0
                    }
                }, 
                "inputStage" : {
                    "stage" : "IXSCAN", 
                    "keyPattern" : {
                        "uid" : NumberInt(1), 
                        "book" : NumberInt(1), 
                        "last_place" : NumberInt(1)
                    }, 
                    "indexName" : "uid_1_book_1_last_place_1", 
                    "isMultiKey" : false, 
                    "multiKeyPaths" : {
                        "uid" : [

                        ], 
                        "book" : [

                        ], 
                        "last_place" : [

                        ]
                    }, 
                    "isUnique" : false, 
                    "isSparse" : false, 
                    "isPartial" : false, 
                    "indexVersion" : NumberInt(2), 
                    "direction" : "forward", 
                    "indexBounds" : {
                        "uid" : [
                            "[80588.0, 80588.0]"
                        ], 
                        "book" : [
                            "[MinKey, MaxKey]"
                        ], 
                        "last_place" : [
                            "[MinKey, MaxKey]"
                        ]
                    }
                }
            }, 
            {
                "stage" : "FETCH", 
                "filter" : {
                    "server_time_stamp" : {
                        "$gt" : 1577918252.0
                    }
                }, 
                "inputStage" : {
                    "stage" : "IXSCAN", 
                    "keyPattern" : {
                        "uid" : NumberInt(1)
                    }, 
                    "indexName" : "uid", 
                    "isMultiKey" : false, 
                    "multiKeyPaths" : {
                        "uid" : [

                        ]
                    }, 
                    "isUnique" : false, 
                    "isSparse" : false, 
                    "isPartial" : false, 
                    "indexVersion" : NumberInt(2), 
                    "direction" : "forward", 
                    "indexBounds" : {
                        "uid" : [
                            "[80588.0, 80588.0]"
                        ]
                    }
                }
            }
        ]
    }, 
    "executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : NumberInt(97), 
        "executionTimeMillis" : NumberInt(1), 
        "totalKeysExamined" : NumberInt(97), 
        "totalDocsExamined" : NumberInt(97), 
        "executionStages" : {
            "stage" : "FETCH", 
            "nReturned" : NumberInt(97), 
            "executionTimeMillisEstimate" : NumberInt(0), 
            "works" : NumberInt(99), 
            "advanced" : NumberInt(97), 
            "needTime" : NumberInt(0), 
            "needYield" : NumberInt(0), 
            "saveState" : NumberInt(3), 
            "restoreState" : NumberInt(3), 
            "isEOF" : NumberInt(1), 
            "invalidates" : NumberInt(0), 
            "docsExamined" : NumberInt(97), 
            "alreadyHasObj" : NumberInt(0), 
            "inputStage" : {
                "stage" : "IXSCAN", 
                "nReturned" : NumberInt(97), 
                "executionTimeMillisEstimate" : NumberInt(0), 
                "works" : NumberInt(98), 
                "advanced" : NumberInt(97), 
                "needTime" : NumberInt(0), 
                "needYield" : NumberInt(0), 
                "saveState" : NumberInt(3), 
                "restoreState" : NumberInt(3), 
                "isEOF" : NumberInt(1), 
                "invalidates" : NumberInt(0), 
                "keyPattern" : {
                    "uid" : NumberInt(1), 
                    "server_time_stamp" : NumberInt(-1)
                }, 
                "indexName" : "server_time_stamp_1_uid_1", 
                "isMultiKey" : false, 
                "multiKeyPaths" : {
                    "uid" : [

                    ], 
                    "server_time_stamp" : [

                    ]
                }, 
                "isUnique" : false, 
                "isSparse" : false, 
                "isPartial" : false, 
                "indexVersion" : NumberInt(2), 
                "direction" : "forward", 
                "indexBounds" : {
                    "uid" : [
                        "[80588.0, 80588.0]"
                    ], 
                    "server_time_stamp" : [
                        "[inf.0, 1577918252.0)"
                    ]
                }, 
                "keysExamined" : NumberInt(97), 
                "seeks" : NumberInt(1), 
                "dupsTested" : NumberInt(0), 
                "dupsDropped" : NumberInt(0), 
                "seenInvalidated" : NumberInt(0)
            }
        }
    }, 
    "serverInfo" : {
        "host" : "mongo-deployment-5cf4f4fff6-dz84r", 
        "port" : NumberInt(27017), 
        "version" : "3.6.15", 
        "gitVersion" : "18934fb5c814e87895c5e38ae1515dd6cb4c00f7"
    }, 
    "ok" : 1.0
}

Solution

  • The issue was about a query that runs well and uses the indexes suddenly stops using the index and results in a very poor performance. This is noted in the query plan and the log respectively.

    The explain's output:

    The query plan's "executionStats" says "totalKeysExamined" : NumberInt(97). The query filter is using index defined on the collection ("stage" : "IXSCAN") and the compound index "server_time_stamp_1_uid_1" is used. Also, the query's sort is using the index (the index on _id). As it is the query and the indexes are working as they are meant to be. And, "executionTimeMillis" : NumberInt(1) says that it is a performant query.

    Details from the log:

    { ... 
    find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }
    planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277  numYields:142780 nreturned:79 
    ... }
    

    From the log, note that the index "server_time_stamp_1_uid_1" is not used.

    Discussion:

    The data and the index (called as working set) for the frequently used queries are kept in the memory (RAM + file system cache). If the working set is not in the memory the system has to load it into the memory during the operation and it results in a slower performance. Reading from disk drive is much slower than the memory. Note that SSD drives are much faster than the HDD drives and when there is no option to increase the memory this could be an option.

    Also, if the query is using indexes and the index size is large and could not be in memory, the index has to be read from the disk drive and it will slow down the operation. More memory is a solution and when not possible the solution can be in redesigning (or re-modeling) the data and its indexes.

    But, the the problem in this case was not the available memory; there is enough of it.

    The following info gives an idea about how much memory might be used for the working set for a given query: db.collection.stats().indexSizes, size, count and avgObjSize.


    Solution:

    The query log with slow performance shows that the index "server_time_stamp_1_uid_1" is not used: planSummary: IXSCAN { _id: 1 }.

    One way to make sure and force the query to use the index (always) is to use the hint on the query. The hint need to be on the index "server_time_stamp_1_uid_1". This way the situation as seen in the log will not happen.

    Another way is to keep the index active in the memory. This can be achieved by running a query on the indexed fields only (a covered query: the query filter and returned fields are of indexed fields only). Running this dummy query, which runs often or before the actual query will make sure the index is available in the memory.

    In this case, as @Laizer mentioned that supplying the hint to the query helped resolve the issue.