Search code examples
c#parallel-processingthread-safety.net-4.5parallel.foreach

Clearing thread safe collections while in Parallel.ForEach loop after insert to SQL database


I have a long running process that uses a Parallel.ForEach loop. While in that loop, I'm creating instance of two different classes based on what is passed, performing minor tasks, then adding to thread safe collections. When complete, all data needs to be inserted to SQL database.

The problem I have is the amount of work done is too large to keep in the collections until all processing is done. I have to push what is retained to SQL occasionally then remove what was pushed from the collection so more processing can continue without running out of memory and I don't know the best way to do that. I can easily do it if it wasn't multi-threaded by checking the count of the collections and if it is over a certain amount, call a function that would push the contents to SQL through a bulk insert or valued table then clear that collection on the next statement. What is the best way to accomplish this within a Parallel.ForEach?

I'm open to using any collection that is thread safe. So far I'm been using the ConcurrentQueue and thought about switching to BlockingCollection because I didn't see a way to clear the ConcurrentQueue. I don't care what order the contents are inserted, but I do need to be able to at least remove what is pushed to SQL database.

My best solution is to use BlockingCollection.GetConsumingEnumerable(). This way, once over x amount, I can copy the contents of that collection to another thread safe collection, perform my insert, then use that list to remove from the original using BlockingCollection.GetConsumingEnumerable(). Once that's done, dispose of the temp list. I just thought there was a better way because if I have to iterate one at a time to remove, it kind of defeats the purpose of have it multi threaded.

I have seen the use of pulse and wait used but I couldn't find a good use case that seems safe. I could have something come through after I tested the collection to be over a certain amount and get cleared before it was inserted to SQL database.

I'm using the 4.5 Framework and I'm managing two different collections that need to be pushed but not necessarily at the same time.


Solution

  • I wouldn't recommend clearing a concurrent collection. Instead, I would 'replace' it with a new one - and process the content from the old one while the other threads push their content to the new one.

    Interlocked.Exchange is the technique I would use to accomplish this.