Search code examples
c#multithreadingazure-worker-rolespoolazure-cloud-services

C# Processing bunch of data by multiple instances of same class


We have web application running in Azure cloud as worker role C#.NET. Part of this application is deciphering lot of short strings in different objects (about 2000 per one request). We want to make it as fast as possible and we need to manage multi threading in correct way.

We had it implemented in way that each object created new instance of class, new thread to decrypt that string. But it took too much time.

If we construct only one instance of class and run all data trough it, it is much faster, instead constructing it over and over again for each object.

Question is, how to improve that. We want to do something like this, but have no idea how:

  1. We want to create pool of data to be decrypted
  2. Create multiple instances of same class (ideally one per CPU core) as one thread per core.
  3. Have some sort of mechanism to feed those instances with data
  4. When pool is empty, close all threads.

We don't want to start new thread for every object in pool, but have limited number of threads running in parallel feeding on same list of data and processing it one by one.


UPDATE 1:

We have tried approaches that was mentioned in comments, especially Storage Queue and Web Job, but due to structure of our code, significant changes would have been necessary to implement with uncertain result. SO it wasn't way to go.

In the end we did following and I will share results at the end:

We are creating 12 instances of "decrypt-or" - deciphering instances with AES 256. Number 12 is only top value and in reality only 4 - 6 instances are created based on load. Closing of instance is done when main Queue is depleted.

All object that have to be deciphered are in Queue and every instance of "decrypt-or" have its own imaginary Queue. So we are processing object from main Queue and searching for "decrypt-or" with 0 objects in imaginary queue or one with lowest count of objects.

Results

Get all method which took the most time and it's our reference:

  1. Original implementation: 6.39 seconds / CPU Load 100% on 16 cores
  2. Implementation with one instance of "decrypt-or": 1.62 seconds / CPU load 50 - 60% on 16 cores
  3. 12 instances of "decrypt-or": 1.27 seconds / CPU load 20 - 25 % on 16 cores

As you can see, we were able to decrease time by 21% compared to single instance implementation but more we reduce CPU usage so we will try to reduce cores without compromising speed.

Next step will be bigger performance tests, to see what are the limits of this approach.


Solution

  • Answer is in update:

    We are creating 12 instances of "decrypt-or" - deciphering instances with AES 256. Number 12 is only top value and in reality only 4 - 6 instances are created based on load. Closing of instance is done when main Queue is depleted.

    All object that have to be deciphered are in Queue and every instance of "decrypt-or" have its own imaginary Queue. So we are processing object from main Queue and searching for "decrypt-or" with 0 objects in imaginary queue or one with lowest count of objects.