Search code examples
pythondatabasemongodbperformancepymongo

Best way to speed up PyMongo loop


I'm currently using a MongoDB database where I'm storing product data. I'm currently using a for loop of around ~50 IDs, and with each iteration, I'm searching for the ID and if the ID doesn't exist, I'm adding it, and if it exists and another column is a specific value, I'll run a function.

for id in ids:
  value = db.find_one({"value": id})
  if value:
    # It checks for some other columns here using both the ID and the return value
  else:
    # It adds the ID and some other information to the database

The problem here is that this is incredibly inefficient. When searching around for other ways to do this, all results show how to get a list of the results, but I'm not sure how this would be implemented in my scenario since I'm running functions and checks with each result and ID.

Thank you!


Solution

  • You can improve by doing only one find request. And in a second time, add all the documents in DB. Maybe with an insert_many ?

    value = db.find({"value": {"$in": ids}})
    for value in values:
        # It checks for some other columns here using both the ID and the return
        ids.remove(value.id)
    
    # Do all your inserts
    # with a loop
    for id in ids:
        df.insert(...)
    # or with insert_many
    db.insert_many(...)