Search code examples
findpymongo

Mongo find() to return all documents is slow for certain documents


I'm experiencing an issue when using PyMongo to iterate over all documents in a particular collection. The loop needs to scan about 450k documents, and it is nearly instant on almost every document except for a handful where a single iteration takes 10-90 seconds.

for testscriptexec in testscriptexecs.find({}, {"tsExecId": 1,"involvedOrgs": 1, "qualifiedName": 1, "endTime": 1, "status": 1}):

I'm trying to figure out what is slowing down the Cursor on certain documents. I determined that the long delays always occur on the same documents.

I compared the JSON export for a slow document and compared it to a fast one and I do not see anything that should be slowing down the indexed search on _id. The documents are not particularly large and the fields that I'm actually pulling are exactly the same size.

The collection has an index on _id, as well as a few other indices that are not relevant to this code.

What are some things that could be causing this query to hang on certain iterations of a find by ID?


Solution

  • These questions are always a bit subjective, but one thought is MongoDB returns data in batches, so that could explain what you are seeing.

    You could rule this in or out by tweaking the batch_size parameter on your find() https://pymongo.readthedocs.io/en/stable/api/pymongo/cursor.html#pymongo.cursor.Cursor.batch_size