Search code examples
mongodbcursorpymongo

Improving performance and understanding the IOPS and usage of a MongoDB Cursor


I am trying to improve the efficiency and speed of my code that uses MongoDB. The code is written in Python and uses the pymongo module.

Currently I have a section in my code that receives a list of values that were possibly removed from the DB and validates which values were actually removed:

verified_removed = []
for value in possibly_removed:
    if db.items.find_one({"name" : value}) is None:
        verified_removed.append(value)

Now I know that I can change it for something like that:

still_exist = list(db.items.find({"name" : {"$in": possibly_removed}))
verified_removed = [val for val in possibly_removed if val not in still_exist]

But I wasn't sure about one thing:
The find method creates a cursor which can be iterated. but does the cursor is more efficient than calling find_one for each of my tested values? or will my IOPS will stay the same in both cases?

How exactly does the cursor work? and what would be the best way to improve performance when having to iterate/update many objects in my DB every ~1minute?


Solution

  • find() grabs a batch at a time, so in most cases it will be more efficient than calling find_one() multiple times. The documentation has more detail https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/#cursor-batches

    if you want to improve performance, consider adding an index on the field you are filtering on. Also check out bulk operations.