Search code examples
c#csv.net-coredotnet-httpclientstreamreader

Skip First Row (CSV Header Row) of HttpResponseMessage Content.ReadAsStream


Below is a simplified example of a larger piece of code. Basically I'm calling one or more API endpoints and downloading a CSV file that gets written to an Azure Blob Container. If there's multiple files, the blob is appended for every new csv file loaded.

The issue is when I append the target blob I ended up with a multiple header rows scattered throughout the file depending on how may CSVs I consumed. All the CSVs have the same header row and I know the first row will always have a line feed. Is there a way to read the stream, skip the content until after the first line feed and then copy the stream to the blob?

It seemed simple in my head, but I'm having trouble finding my way there code-wise. I don't want to wait for the whole file to download and then in-memory delete the header row since some of these files can be several gigabytes.

I am using .net core v6 if that helps

using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
    for (int i = 0; i < 3; i++)
    {
        using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);

        Stream sourceStream = response.Content.ReadAsStream();
        sourceStream.CopyTo(blobStream);
    }
}

Solution

  • .CopyTo copies from the current position in the stream. So all you need to do is throw away all the characters until you have thrown away the first CR or Line Feed.

    using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
    {
        for (int i = 0; i < 3; i++)
        {
            using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
    
            Stream sourceStream = response.Content.ReadAsStream();
    
            if (i != 0)
            {
                char c;
                do { c = (char)sourceStream.ReadByte(); } while (c != '\n');
            }
            sourceStream.CopyTo(blobStream);
        }
    }
    

    If all the files always have the same size header row, you can come up with a constant for its length. That way you could just skip the stream to the exact correct location like this:

    using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
    {
        for (int i = 0; i < 3; i++)
        {
            using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
    
            Stream sourceStream = response.Content.ReadAsStream();
            if (i != 0)
                sourceStream.Seek(HeaderSizeInBytes, SeekOrigin.Begin);
            sourceStream.CopyTo(blobStream);
        }
    }
    

    This will be slightly quicker but does have the downside that the files can't change format easily in the future.


    P.S. You probably want to Dispose sourceStream. Either directly or by wrapping its creation in a using statement.