Search code examples
c#task-parallel-librarylong-running-processes

Is it the correct implementation?


I am having a Windows Service that needs to pick the jobs from database and needs to process it.

Here, each job is a scanning process that would take approx 10 mins to complete.

I am very new to Task Parallel Library. I have implemented in the following way as sample logic:

Queue queue = new Queue();

for (int i = 0; i < 10000; i++)
{
    queue.Enqueue(i);
}

for (int i = 0; i < 100; i++)
{
    Task.Factory.StartNew((Object data ) =>
    {
        var Objdata = (Queue)data;
        Console.WriteLine(Objdata.Dequeue());
        Console.WriteLine(
            "The current thread is " + Thread.CurrentThread.ManagedThreadId);
    }, queue, TaskCreationOptions.LongRunning);
}

Console.ReadLine();

But, this is creating lot of threads. Since loop is repeating 100 times, it is creating 100 threads.

Is it right approach to create that many number of parallel threads ?

Is there any way to limit the number of threads to 10 (concurrency level)?


Solution

  • An important factor to remember when allocating new Threads is that the OS has to allocate a number of logical entities in order for that current thread to run:

    1. Thread kernel object - an object for describing the thread, including the thread's context, cpu registers, etc
    2. Thread environment block - For exception handling and thread local storage
    3. User-mode stack - 1MB of stack
    4. Kernel-mode stack - For passing arguments from user mode to kernel mode

    Other than that, the number of concurrent Threads that may run depend on the number of cores your machine is packing, and creating an amount of threads that is larger than the number of cores your machine owns will start causing Context Switching, which in the long run may slow your work down.

    So after the long intro, to the good stuff. What we actually want to do is limit the number of threads running and reuse them as much as possible.

    For this kind of job, i would go with TPL Dataflow which is based on the Producer-Consumer pattern. Just a small example of what can be done:

    // a BufferBlock is an equivalent of a ConcurrentQueue to buffer your objects
    var bufferBlock = new BufferBlock<object>();
    
    // An ActionBlock to process each object and do something with it
    var actionBlock = new ActionBlock<object>(obj =>
    {
         // Do stuff with the objects from the bufferblock
    });
    
    bufferBlock.LinkTo(actionBlock);
    bufferBlock.Completion.ContinueWith(t => actionBlock.Complete());
    

    You may pass each Block a ExecutionDataflowBlockOptions which may limit the Bounded Capacity (The number of objects inside the BufferBlock) and MaxDegreeOfParallelism which tells the block the number of maximum concurrency you may want.

    There is a good example here to get you started.