c#mongodb mongodb-.net-driver parallel.foreach parallels

Error writing to MongoDB using Parallelism

I have a collection in mongo that has subdocuments, then read xml files and they'll record in MongoDB. Each xml file is a document in mongo.

My classes

public class Header
{
    public Header()
    {
        Operation= new List<Operation>();
    }

    public ObjectId Id { get; set; }
    public Int64 Code1 {get; set;}
    public Int64 Code2 {get; set;}
    public string Name { get; set; }
    public List<Operation> Operations { get; set; }
}

public class Operation
{
    public Operation()
    {
        Itens = new List<Item>();
    }

    public string Value { get; set; }
    public List<Item> Item { get; set; }
}

public class Item
{
    public string Value { get; set; }
}

Headline in the class, and Codigo2 Code1 used to create the index of the document in MongoDB. Code1 and Codigo2 compose the XML file name, and since they are all in one folder there is no possibility of being duplicated.

Recording in MongoDB

To record in mongo am using the following code:

var po = new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount > 1 ? Environment.ProcessorCount / 2 : 1 };

Parallel.ForEach(arquivos, po, (arquivo, state) =>
{

    MongoCollection<Header> collection = MongoConnect.GetHeader();

            try
            {
                var Header = new Header();
                Header.Name = @"Valor 1";
                Header.Code1 = arquivo.Name.Split('-').Count() > 1 ? Int64.Parse((arquivo.Name.Split('-')[1]).Replace(".", "")) : 0;
                Header.Code2 = arquivo.Name.Split('-').Count() > 1 ? Int64.Parse((arquivo.Name.Split('-')[2]).Replace(".siag", "")) : 0;

                var body = record.SelectSingleNode("body");
                if (body != null)
                {
                    string[] linhas = body.InnerText.Split(new String[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
                    foreach (var linha in linhas)
                    {
                        string conteudo = linha;
                        var operation = new Operation();
                        if (!conteudo.Contains("\t"))
                        {
                            string tipo = conteudo.Substring(0, conteudo.IndexOf(' ')).Trim();
                            string tabela = conteudo.Substring(0, conteudo.IndexOf("Quando:", System.StringComparison.Ordinal)).Trim();
                            operation.value = tabela;
                            conteudo = conteudo.Remove(0, (tabela + " Quando:").Length);

                            Header.Operations.Add(operation);
                        }
                        else
                        {
                            var item = new Item();
                            string[] campos = conteudo.Split('\t');
                            item.Value = campos[0];

                            Header.Operations.Last().Itens.Add(item);
                        }
                    }

                    try
                    {
                        collection.Save(Header);
                    }
                    catch (Exception ex)
                    {
                        //Duplicate Key error show here
                    }
                }
            }
            catch (Exception ex)
            {
                //Log Error Here
            }

});

Note: Please, no consider read of the file, put there just to illustrate.

Full Error

Erro: WriteConcern detected an error ''. (Response was { "ok" : 1, "code" : 11000, "err" :
 "insertDocument :: caused by :: 11000 E11000 duplicate key error index:
 DB.Collection.$Codigo1_1_Codigo2_1  dup key: { : 359922397, : 1217185957 }", 
"n" : NumberLong(0) })

Solution

The reason is this : source code

shows the objectId gets created :

static ObjectId()
    {
        __staticMachine = (GetMachineHash() + AppDomain.CurrentDomain.Id) & 0x00ffffff; // add AppDomain Id to ensure uniqueness across AppDomains
        __staticIncrement = (new Random()).Next();

        try
        {
            __staticPid = (short)GetCurrentProcessId(); // use low order two bytes only
        }
        catch (SecurityException)
        {
            __staticPid = 0;
        }
    }

But if you run :

  var po = new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount > 1 ? Environment.ProcessorCount / 2 : 1 };

        var items = new List<string>()
        {
            "Foo",
            "Bar"
        };
        Parallel.ForEach(items, po, (arquivo, state) =>
        {
            Console.WriteLine((new Random()).Next());
        });

You get :

  1259271181
  1259271181

Because the random does not do well paralellized. You have to either define an Id that isn't using ObjectId. Or Make it threadsafe

From our comments I would create the Header Class Like :

public class Header
{
   public Header()
   {
      Operation= new List<Operation>();
   }
   [BsonId]
   public Codes Id {get; set;}
   public Int64 Code2 {get; set;}
   public string Name { get; set; }
   public List<Operation> Operations { get; set; }
}
public class Codes {
   public Int64 Code1 {get; set;}
   public Int64 Code2 {get; set;}
}