C# The best way to callback a client application with interim results from a pool of threads

I have written a C# library which has a method to count words from multiple passages of text in parrallel. The passages of text are given as character streams where there is a random delay each time getnextchar() is called. My library method has to take an array of these character streams and return a combined word-frequency count. To do this I have a safely shared word-frequency data structure and a thread to read each character stream and update the shared collection. When all the threads have completed then I return the data structure to the client application.

The client application needs interim results of the combined word counting every 10 seconds. To do this I use a delegate to call back the client every 10 seconds with the results untill all of the worker threads have completed, after which I return the final results to the client.

My problem is that when I callback the client with the interim results I have to lock my shared data structure and wait for the client application to return from the callback before I can un-lock it. Whilst the callback is executing all of the worker threads are blocked waiting for the lock on the data structure. This doesnt seem like a sensible thing to do, because I dont think I should rely or trust the client code to return promptly or even at all. However they only other way I can think of doing it which does not rely on the client code is to make a copy or snapshot of my data structure and pass that to the client through the callback. This is at the expense of memory and computation but once the copy is made the workers can continue updating the shared collection and the callback can do whatever it wants.

My question is two fold:

1) Which is the lesser of two evils, allowing the possibility of a bad client callback implementation to block the workers, or periodically performing an expensive operation.

2)Is there a way to solve this problem which doesnt do either of the above?

Solution

I would definitely say the greater of the two evils is trusting the client. Calling back into unknown code while holding a lock is a recipe for deadlocks and worse scenarios. Yes, you will pay an overhead to take a snapshot of your structure and return it. But the extra memory for a snapshot outweighs the risk of calling out while holding a lock.

I've run into similar situations and thus far the snapshot has not been a problem. If it is though, you can find several ways to work around this. Including increasing the frequency with which you call the client which will reduce the amount of data you have to snapshot at a given time.

Another way around this is to use an immutable data structure. When you are ready to talk to the client simply break off the current version and hand it to the client. Allow your background threads to start building a new one.