Search code examples
c#azurejson.netazure-blob-storage

Azure Blob storage very slow read time


I am saving JSON data as Block Blobs in Azure Blob Storage - Standard Tier. The file size is 14.5MB, it contains about 25,000 objects of OLHC data I access the blob from an Azure Function located in the same region. The code simply reads the blob for deserialization, but it takes 20-40 seconds. Is there something I missed?

    public static async Task<Stream> GetBlob(string ConnectionString, string ContainerName, string Path)
    {
        BlobClient blobClient = new BlobClient(ConnectionString, ContainerName, Path);
        MemoryStream ms = new MemoryStream();

        try
        {
            await blobClient.DownloadToAsync(ms);
            ms.Seek(0, SeekOrigin.Begin);
            return ms;
        } catch (Exception ex)
        {
            ms.Dispose();
            throw;
        }        
    }

And I request the blob in the function

        log.LogInformation($"Begin Downloading Blob ");
        using (Stream blob = await Core.Azure.Blob.GetBlob(blobString, "containerName", fileName))
        {
            log.LogInformation($"End Downloading Blob ");
            log.LogInformation($"Begin Reading Blob ");
            using (StreamReader reader = new StreamReader(blob))
            {
                string json = await reader.ReadToEndAsync();
                log.LogInformation($"Begin Deserialize Blob ");
                sticks = JsonConvert.DeserializeObject<List<MyModel>>(json);
                log.LogInformation($"End Deserialize Blob ");
            }
        }
        log.LogInformation($"{symbol} End Get Blob ");

Check Blob Exist Function

    public static async Task<bool> CheckExists(string ConnectionString, string ContainerName, string Path)
    {
        BlobClient blobClient = new BlobClient(ConnectionString, ContainerName, Path);
        return await blobClient.ExistsAsync();
    }

This is the result of the timing is up to 47 Seconds

I switch to stream reader and JSON Reader and it drops to 10-30 seconds.. but still, that's a very long time

I have added the timing here

2021-01-09 23:53:26.656 Begin Downloading Blob
2021-01-09 23:53:30.163 End Downloading Blob
2021-01-09 23:53:30.163 Begin Reading Blob
2021-01-09 23:53:37.040 Begin Deserialize Blob
2021-01-09 23:53:49.737 End Deserialize Blob

Another Run
OHLCData.Json 14.44 MB (28,000 rows)

2021-01-10 12:40:49.970 Begin Check Blob Exists
2021-01-10 12:40:58.962 End Check Blob Exists
2021-01-10 12:40:58.962 Begin Downloading Blob
2021-01-10 12:41:08.181 End Downloading Blob
2021-01-10 12:41:08.187 Begin Reading Blob
2021-01-10 12:41:25.713 Begin Deserialize Blob
2021-01-10 12:41:33.817 End Deserialize Blob
2021-01-10 12:41:33.817 End Get Blob


Solution

  • You are downloading the whole blob into memory stream (unnecessary extra memory kill), converting to string and then deserializing it. I would rather do it directly from blob stream in one shot leveraging the stream support of Newtonsoft.Json like below to optimize speed and memory use.

    BlobClient blobClient = new BlobClient(ConnectionString, ContainerName, Path);
    using (var stream = await blobClient.OpenReadAsync())
    using (var sr = new StreamReader(stream))
    using (var jr = new JsonTextReader(sr))
    {
        result = JsonSerializer.CreateDefault().Deserialize<T>(jr);
    }
    

    You can also do similar using System.Text.Json APIs.

    JsonSerializerOptions Options = new JsonSerializerOptions();
    using (var stream = await blobClient.OpenReadAsync())
    {
        result = await JsonSerializer.DeserializeAsync<T>(stream , Options);
    }