Search code examples
mongodbpymongo

MongoDb update a field in a huge collection using pymongo fast


I have an 13GB of documents in a collection in mongoDB where I need to update a field ip_address. The original value and the replacement values are given in excel sheet. I am looping through each value from excel and updating it using:

old_value={"ip_address":original_value}
new_value={"$set":{"ip_address":replacement_value}
tableConnection.update_many(old_value,new_value)

In order to process 1 update it is taking over 2 minutes. I have 1500 updates to do. Is there any better way to do it?


Solution

  • Bulk operations won't speed up your updates by much; the best way to achieve a performance increase is to add an index. This can be as simple as:

    db.collection.createIndex({'ip_address': 1})
    

    Refer to the documentation regarding potential blocking on certain older versions of the database https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/

    The index will take up addtional storage; if that is an issue you can delete the index once you've completed the updates.