Search code examples
c#multithreadingparallel.foreach

Single thread working and multithread not working


Below is my current code which gets 500 documents(JSON format) from the documentDB per call. I can only do 500 per search and adding it to a concurrent bag(in parallel). The data fetched is based on the id number I provide where to the API and picks it from that range. E.g. id = 500 [gets documents from 501 - 1000]. The below code fills concurrent bag with 25k documents as expected.

int threadNumber = 5;    
var concurrentBag = new ConcurrentBag<docClass>();

    if (batch == 25000)
    {
        id = 500;
        while (id <= 25000)
        {
         docs = await client.SearchDocuments<docClass>(GetFollowUpRequest(id), requestOptions);
         docClass lastdoc = docs.Documents.Last();
         lastid = lastdoc.Id.Id;

         Parallel.ForEach(docs.Documents, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, item =>
          {
              concurrentBag.Add(item);
          });
         id = id + 500;
        }
    }

I wanted to run this whole while loop in threading so that I can do a multiple call to API and fetch 500 documents parallely. I tried to modify the code as below but always I see only 500 documents still in the concurrent bag 'concurrentBag' after the whole run and the skip id stays at 500 and doesnt increment.

    int threadNumber = 5;    
    var concurrentBag = new ConcurrentBag<docClass>();

if (batch == 25000)
 {
     id = 500;
     Task[] tasks = new Task[threadNumber];

     for (int j = 0; j < threadNumber; j++)
     {
         tasks[j] = Task.Run(async() =>
         {
             while (id <= 25000)
             {
                 docs = await client.SearchDocuments<docClass>(GetFollowUpRequest(id), requestOptions);
                 docClass lastdoc = docs.Documents.Last();
                 lastid = lastdoc.Id.Id;

                 Parallel.ForEach(docs.Documents, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, item =>
                 {
                     concurrentBag.Add(item);
                 });

                 id = id + 500;
             }
         });
     }
 }

Can you please help what am I doing wrong here?


Solution

  • For loading document from external resources use asynchronous approach without extra threads.

    Note, that when you download external resources in parallel, extra threads doing no work, but just waiting for the response, so threads are just being wasted ;)

    Asynchronous approach provide possibility to launch multiple requests almost simultaneously, without waiting for every task to complete, but wait only when all tasks are ready.

    var maxDocuments = 25000;
    var step = 500;
    var documentTasks = Enumerable.Range(1, int.Max)
        .Select(offset => step * offset)
        .TakeWhile(id => id <= maxDocuments)
        .Select(id => client.Search<docClass>(GetFollowUpRequest(id), requestOptions))
        .ToArray();
    
    await Task.WhenAll(documentTasks);
    
    var allDocuments = documentTasks
        .Select(task = task.Result)
        .SelectMany(documents => documents)
        .ToArray();