Search code examples
pythonmongodbmongodb-querymongo-shelldatabase

Delete partial data in mongoDB


I have a mongoDB collection which has a count of 372985 names, I want to delete entries after 200000 so that total number of entries after deletion reduces from 372985 to 200000

How can I do this by mongoDB query?

Usecase

My python code is unable to process huge data as per my machine configuration, So I want to reduce the size of mongo collection so that it can run in limited RAM.

If this cannot be done by mongo query, Can someone give hint for trying python to do the same.


Solution

  • You need to do it in steps, cause MongoDB needs a query to match documents to be deleted; MongoDB cannot use skip or limit when removing documents.

    1. find (the ids of) documents that you want to delete, using skip to jump to documents after 200000
    2. delete the documents that belong to the list found in 1

    You can try in mongo shell:

    var to_delete = db.collection.find({}, {_id : 1})
            .skip(200000)
            .toArray()
            .map(function(doc) { return doc._id; });
    
    db.collection.remove({_id: {$in: to_delete}})