Search code examples
c#multithreadingasynchronousbackgroundworkerlarge-files

multithread read and process large text files


I have 10 lists of over 100Mb each with emails and I wanna process them using multithreads as fast as possible and without loading them into memory (something like reading line by line or reading small blocks)

I have created a function which is removing invalid ones based on a regex and another one which is organizing them based on each domain to other lists.

I managed to do it using one thread with: while (reader.Peek() != -1) but it takes too damn long.

How can I use multithreads (around 100 - 200) and maybe a backgroundworker or something to be able to use the form while processing the lists in parallel?

I'm new to csharp :P


Solution

  • There are multiple approaches to it:

    1.) You can create threads explicitly like Thread t = new Thread(), but this approach is expensive on creating and managing a thread.
    2.) You can use .net ThreadPool and pass your executing function's address to QueueUserWorkItem static method of ThreadPool Class. This approach needs some manual code management and synchronization primitives.
    3.) You can create an array of System.Threading.Tasks.Task each processing a list which are executed parallely using all your available processors on the machine and pass that array to task.WaitAll(Task[]) to wait for their completion. This approach is related to Task Parallelism and you can find detailed information on MSDN

    Task[] tasks = null;
    for(int i = 0 ; i < 10; i++)
    {
         //automatically create an async task and execute it using ThreadPool's thread
         tasks[i] = Task.StartNew([address of function/lambda expression]);
    }
    
    try
    {
        //Wait for all task to complete
        Task.WaitAll(tasks);
    }
    catch (AggregateException ae)
    {
        //handle aggregate exception here
        //it will be raised if one or more task throws exception and all the exceptions from defaulting task get accumulated in this exception object
    }
    
    //continue your processing further