Search code examples
c#sql-serverentity-frameworkthread-safetyparallel.foreach

Parallel.ForEach and SQL insert in batches of 20


I am writing some code inside a Parallel.ForEach statement. Each thread is supposed to do some work, then insert the result in a List. When the list hits a count of %20, it should insert the items into the DB, as I don't want each thread to open a new connection and insert one item. Obviously, with the below code, many threads are modifying my list and it is erroring with Collection was modified. I also can't use Lock as I lose the benefit of parallelization if every thread has to wait for the lock to modify the list.

What would be the best way to achieve the above? I tried pausing the Foreach and then resuming but was not successful.

private ItemsDAL _itemsDal = new ItemsDAL();
public void Run()
{
    List<Item> items = new List<Item>();

    for (int i = 0; i < 1500; i++)
    {
        items.Add(new Item()
        {
            Details = "Test " + i
        });
    }

    Parallel.ForEach(items, (oneItem) =>
    {
        Console.WriteLine("Adding item with Details " + oneItem.Details);
        _itemsDal.InsertToDatabase(oneItem);
    });

}

DAL Class

public class ItemsDAL
{
    object _lock = new object();
    private List<Item> _itemsToInsert = new List<Item>();

    public void InsertToDatabase(Item item)
    {
        _itemsToInsert.Add(item);
        if (_itemsToInsert.Count >= 20)
        {

            //Add to database in bulk (Blocs of 20)
            using (ForeachContext db = new ForeachContext())
            {
                foreach (var itemToInsert in _itemsToInsert)
                {
                    db.Items.Add(itemToInsert);
                }
                db.SaveChanges();
            }

            _itemsToInsert.Clear();
        }
    }
}

Solution

  • I also can't use Lock as I lose the benefit of parallelization if every thread has to wait for the lock to modify the list.

    Not really. Adding an item to the list is very fast. And locking for just that operation you're not going to loose any benefit of parallelization for the part where the threads "do some work" before inserting into the list.

    Really, you can simply use a lock(_lock) {...} block for the whole body of InsertToDatabase. And that will

    pause all threads when my list hits a count of 20, and then add the items to the DB

    The case where the list has less than 20 items, the lock lasts only for the duration of

    _itemsToInsert.Add(item);
    

    which in the context of a method that makes API calls or writes to a database is negligible.