Search code examples
pythonmongodbpymongo

Insert documents into mongodb using `insertmany` method of pymongo ignoring "InvalidDocument"


I have a extremely large data set(which is extracted for nginx log files), and some of the keys of the documents contain ., which may lead to invaliddocument error.

Instead of filtering out these invalid documents or replace the dots inside the keys, I prefer just ignore these documents, is there any way that I can ignore the invalid documents when insert_many with pymongo?


Solution

  • Normally you can "ignore" errors on an insert_many by setting the ordered=False parameter; however this still fails for an invalid document apparently by design.

    You can, however, do something like this:

    import pymongo
    import bson.errors
    
    db = pymongo.MongoClient()['mydatabase']
    
    data_to_load = [{"ok": 1},
                    {"ok": 2},
                    {"not.ok": 3},
                    {"ok": 4},
                    {"ok": 5}]
    
    for item in data_to_load:
        try:
            db.testdata.insert_one(item)
        except bson.errors.InvalidDocument:
            pass
    
    for item in db.testdata.find({}, {'_id': 0}):
        print(item)
    

    Result:

    {'ok': 1}
    {'ok': 2}
    {'ok': 4}
    {'ok': 5}