Search code examples
c#concurrencytask-parallel-libraryazure-data-lakeparallel.for

ADLS ConcurrentAppend giving corrupt data for 1 MB files


When I use Parallel.For loop to append 10 files of 1 MB size concurrently to Azure Data Lake Service, I only see the content of last 2 files on my Azure Data Lake file, although I see the correct data getting printed to console.

When I use a simple for loop instead of this Parallel.For, data getting appended to file is correct.

Any help ?

Parallel.For(0, 10, i =>
{
    path[i] = @"C:\Users\t-chkum\Desktop\InputFiles\1MB\" + (i + 1) + ".txt";

    FileStream stream = File.OpenRead(path[i]);

    stream.Read(buffer, 0, buffer.Length);
    Console.WriteLine(Encoding.UTF8.GetString(buffer));


    client.ConcurrentAppend(fileName, true, buffer, 0, buffer.Length);

    stream.Close();
});

Solution

  • It was actually a critical section problem and can be solve using either block collections or lock :

    BlockingCollection<int> b = new BlockingCollection<int>(1);
    Parallel.For(0, 10, i =>
    {
        b.Add(i);
        path[i] = @"C:\Users\t-chkum\Desktop\InputFiles\1MB\" + (i + 1) + ".txt";
        FileStream stream = File.OpenRead(path[i]);
    
        stream.Read(buffer, 0, buffer.Length);
    
        client.ConcurrentAppend(fileName, true, buffer, 0, buffer.Length);
    
        Array.Clear(buffer, 0, buffer.Length);
    
        stream.Close();
        b.Take();
    });
    

    The above code solves the problem for me :)