Search code examples
mongodbmongodb-querymongodb-.net-drivermongodb-csharp-2.0

How do I skip duplicate documents in a bulk insert and ignore duplicates with a specific field c#


I need to insert many documents and ignore the duplicated docs.

Doc format:

_id:5b84e2588aceda018a974450
Name:"Jeff M"
Email:"jeff.m@xtrastaff.com"
Type:"Client"
UserId:Binary('Rw+KMGpSAECQ3gwCtfoKUg==')
UserImage:null

I want to check the duplication using the EmailId field when I am inserting. Insert only if it is not existing.


Solution

  • To prevent the duplicates being inserted you need a unique index which can be created in C# code:

    public void CreateIndex()
    {
        var options = new CreateIndexOptions() { Unique = true };
        var field = new StringFieldDefinition<Model>(nameof(Model.Email));
        var indexDefinition = new IndexKeysDefinitionBuilder<Model>().Ascending(field);
        Collection.Indexes.CreateOne(indexDefinition, options);
    }
    

    Then you can insert multiple documents using BulkWrite operation. The problem is that by default the processing will be stopped when first insert operation fails (which happens when you try to insert a duplicate) and you'll get an exception in C#. You can modify that by setting ordered parameter to false which means that all the inserts will be processed "in parallel" and you'll get one exception which aggregates all failed inserts. That exception is of type MongoBulkWriteException and you can try to catch it. So you can try following method:

    public void InsertData(List<Model> data)
    {
        var writeOps = data.Select(x => new InsertOneModel<Model>(x));
        try
        {
            Collection.BulkWrite(writeOps, new BulkWriteOptions() { IsOrdered = false });
        }
        catch (MongoBulkWriteException ex)
        {
            // will be thrown when there were any duplicates
        }
    }