Search code examples
mongodbprofilerchangestream

Mongo Changestream getmore profiling


While investigating performance issues on our Atlas hosted database, we began to observe a high count of Docs Returned in the Atlas profiler. This high number of returned documents is the result of the changestream getMore calls of the cursor, and we sometimes see the number of returned documents reaching over 5000 records. We are aware that we are not making that many changes and are wondering what could explain the high number of records being returned. Is this number a cumulative count from the time the cursor was established?

enter image description here


Solution

  • From the comments:

    Is the only question here if that number is cumulative or not? Yes, that is the question.

    No, the number in your screenshot is not cumulative.

    Your screenshot is from the Atlas profiler. Both the Atlas profiler documentation as well your reference to the screenshot being associated with a getMore call point to the fact that we are looking at the results of a single operation.

    Somewhat relatedly, the Response Length field in that screenshot corresponds to the reslen of the operation. Again this is the amount of data transferred for that single call. There is a hard cap of 16MB of data transmitted per batch, so the size of the documents being returned influence how many will get sent in a single batch.

    However,after doing my own research it appears that this number could be cumulative. I started a watch against my local mongo datatbase and made changes to the collection to see what the result of running db.currentOp({ns:"{dbname}.$cmd"}); would return. The nDocsReturned value seems to be counting up the number of updates made. How do you explain that if not cumulative ?

    Thanks for the additional context! You are looking at something 'different' here though which is the source of confusion and incorrect conclusion.

    In this interface the object being reported against is the cursor itself. A cursor is the object that is created when a query, including a change stream, is executed and will return more than one batch of data.

    As you have correctly concluded, the nDocsReturned metric in this interface is cumulative for the cursor. The documentation confirms this:

    The cumulative number of documents returned by the cursor.

    Immediately afterwards that page also informs us about nBatchesReturned:

    currentOp.cursor.nBatchesReturned

    The cumulative number of batches returned by the cursor.

    So each individual getMore call will add to these cumulative cursor totals that can be observed in currentOp. But the log file and profiler are only going to report the metrics associated with the individual getMore executions (that are slow enough to be reported in those places).

    We are aware that we are not making that many changes

    It sounds like either there are more changes being made than you are expecting or change streams are being (re)opened more often than anticipated (or not using 'resume after' functionality appropriately).