Search code examples
javascriptmongodbmapreducemongodb-queryno-duplicates

MongoDB query to remove duplicate documents from a collection


I take data from a search box and then insert into MongoDB as a document using the regular insert query. The data is stored in a collection for the word "cancer" in the following format with unique "_id".

{
  "_id": {
    "$oid": "553862fa49aa20a608ee2b7b"
  },
  "0": "c",
  "1": "a",
  "2": "n",
  "3": "c",
  "4": "e",
  "5": "r"
}

Each document has a single word stored in the same format as above. I have many documents as such. Now, I want to remove the duplicate documents from the collection. I am unable to figure out a way to do that. Help me.


Solution

  • an easy solution in mongo shell: `

    use your_db
    db.your_collection.createIndex({'1': 1, '2': 1, '3': 1, etc until you reach maximum expected letter count}, {unique: true, dropDups: true, sparse:true, name: 'dropdups'})
    db.your_collection.dropIndex('dropdups')
    

    notes:

    • if you have many documents expect this procedure to take very long time
    • be careful this will remove documents in place, better clone your collection first and try it there.