Search code examples
javamongodbindexingduplicatescrud

Create index in MongoDB 3.2 to avoid duplicated documents/rows


I'm using MongoDB 3.2 and want to avoid the duplicates in my collection. In order to do that I use createIndex() method (I tried different variants, none of them doesn't work):

dbColl.createIndex(new Document("guid", 1));
dbColl.createIndex(new BasicDBObject("guid", 1));
dbColl.createIndex(new Document("guid.content", 1));
dbColl.createIndex(new BasicDBObject("guid.content", 1));

Then I try to execute data insert with:

itemsArr.forEach(
    item -> dbColl.insertOne(Document.parse(item.toString()))
);

I do it two times and anticipate that the second time MongoDB will not add any new row since the data has been already added and there is an index on the guid field. But that's not the case MongoDB adds duplicates despite index value.

Why does MongoDB add duplicates even if there is an index on a guid and/or guid.content field? And how to fix it? I want to be able to add the document with the same guid field only one time.

Here is a sample of documents structure: Documents Schema Example

In my data the guid field is a unique document identifier.


Solution

  • With the help of Phillip, I composed a completely worked solution for the problem «How to avoid duplicates / skip duplicates on insert» in MongoDB 3.2 for Java Driver 3.2.0:

        IndexOptions options = new IndexOptions();
    
        // ensure the index is unique
        options.unique(true);
        // define the index
        dbColl.createIndex(new BasicDBObject("guid", 1), options);
    
        // add data to DB
        for (Object item : itemsArr) {
    
            // if there is a duplicate, skip it and write to a console (optionally)
            try {
                dbColl.insertOne(Document.parse(item.toString()));
            } catch (com.mongodb.MongoWriteException ex) {
                //System.err.println(ex.getMessage());
            }
        }
    

    Feel free to use this ready-to-use solution.