Search code examples
c#azure-blob-storagememorystream

Reading and copying large files/blobs without storing them in memory stream in C#


below is the code which reads the blobs from my blob storage and then copy the contents in a tabular storage. Everything works fine now. but I know that if my file is too big then it will fail this reading and copying. I would like to know how do we handle this ideally, is it we write the file temporarily instead of storing it in memory? If yes , can someone give me example or show me how to do it in my existing code below >

public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
        {
            string data = string.Empty;
            string fileExtension = Path.GetExtension(fileName);
            var contents = await DownloadBlob(containerName, digestFileName, connectionString);
                           
            return contents;
        }

    public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
    {        

       Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
        CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
        CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
        CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
        if (!blob.Exists())
        {
            throw new Exception($"Unable to upload data in table store for document");
        }
       
        return await blob.OpenReadAsync();  
}

     private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
    {
        
            using (TextReader reader = new StreamReader(source, Encoding.UTF8))
            {
            
                var cache = new TypeConverterCache();
                cache.AddConverter<float>(new CSVSingleConverter());
                cache.AddConverter<double>(new CSVDoubleConverter());
                var csv = new CsvReader(reader,
                    new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
                    {
                        Delimiter = ";",
                        HasHeaderRecord = true,
                        CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
                        TypeConverterCache = cache
                    });
                csv.Read();
                csv.ReadHeader();


                var map = (
                        from col in cols
                        from src in col.Sources()
                        let index = csv.GetFieldIndex(src, isTryGet: true)
                        where index != -1
                        select new { col.Name, Index = index, Type = col.DataType }).ToList();

                while (csv.Read())
                {
                    yield return map.ToDictionary(
                        col => col.Name,
                        col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
                }
            
            }
        
    }

Solution

  • At your insistence that CsvHelper is incapable of reading from a stream connected to a blob, I threw something together:

    • WinForms core app (3.1)
    • CsvHelper latest (19)
    • Azure.Storage.Blobs (12.8)

    A CSV from my disk:

    enter image description here

    On my blob storage:

    enter image description here

    In my debugger, it has record CAf255 OK by Read/GetRecord:

    enter image description here

    Or by EnumerateRecords:

    enter image description here

    Using this code:

        private async void button1_Click(object sender, EventArgs e)
        {
            var cstr = "MY CONNECTION STRING HERE";
    
            var bbc = new BlockBlobClient(cstr, "temp", "call.csv");
    
            var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });
    
            var sr = new StreamReader(s);
    
            var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true });
    
            var x = new X();
    
            //try by read/getrecord (breakpoint and skip over it if you want to try the other way)
            while(await csv.ReadAsync())
            {
                var rec = csv.GetRecord<X>();
                Console.WriteLine(rec.Sid);
            }
    
            //try by await foreach
            await foreach (var r in csv.EnumerateRecordsAsync(x))
            {
                Console.WriteLine(r.Sid);
            }
        }
    

    Oh, and the class that represents a CSV record in my app (I only modeled one property, Sid, to prove the concept):

    class X {
        public string Sid{ get; set; }
    }
    

    Maybe dial things back a bit, start simple. One string prop in your CSV, no yielding etc, just get the file reading in OK. I didn't bother with all the header faffing either - seems to just work OK by saying "file has headers" in the options - you can see my debugger has an instance of X with a correctly populated Sid property showing the first value. I ran some more loops and they populated OK too