I am saving JSON data as Block Blobs in Azure Blob Storage - Standard Tier. The file size is 14.5MB, it contains about 25,000 objects of OLHC data I access the blob from an Azure Function located in the same region. The code simply reads the blob for deserialization, but it takes 20-40 seconds. Is there something I missed?
public static async Task<Stream> GetBlob(string ConnectionString, string ContainerName, string Path)
{
BlobClient blobClient = new BlobClient(ConnectionString, ContainerName, Path);
MemoryStream ms = new MemoryStream();
try
{
await blobClient.DownloadToAsync(ms);
ms.Seek(0, SeekOrigin.Begin);
return ms;
} catch (Exception ex)
{
ms.Dispose();
throw;
}
}
And I request the blob in the function
log.LogInformation($"Begin Downloading Blob ");
using (Stream blob = await Core.Azure.Blob.GetBlob(blobString, "containerName", fileName))
{
log.LogInformation($"End Downloading Blob ");
log.LogInformation($"Begin Reading Blob ");
using (StreamReader reader = new StreamReader(blob))
{
string json = await reader.ReadToEndAsync();
log.LogInformation($"Begin Deserialize Blob ");
sticks = JsonConvert.DeserializeObject<List<MyModel>>(json);
log.LogInformation($"End Deserialize Blob ");
}
}
log.LogInformation($"{symbol} End Get Blob ");
Check Blob Exist Function
public static async Task<bool> CheckExists(string ConnectionString, string ContainerName, string Path)
{
BlobClient blobClient = new BlobClient(ConnectionString, ContainerName, Path);
return await blobClient.ExistsAsync();
}
This is the result of the timing is up to 47 Seconds
I switch to stream reader and JSON Reader and it drops to 10-30 seconds.. but still, that's a very long time
I have added the timing here
2021-01-09 23:53:26.656 Begin Downloading Blob
2021-01-09 23:53:30.163 End Downloading Blob
2021-01-09 23:53:30.163 Begin Reading Blob
2021-01-09 23:53:37.040 Begin Deserialize Blob
2021-01-09 23:53:49.737 End Deserialize Blob
Another Run
OHLCData.Json 14.44 MB (28,000 rows)
2021-01-10 12:40:49.970 Begin Check Blob Exists
2021-01-10 12:40:58.962 End Check Blob Exists
2021-01-10 12:40:58.962 Begin Downloading Blob
2021-01-10 12:41:08.181 End Downloading Blob
2021-01-10 12:41:08.187 Begin Reading Blob
2021-01-10 12:41:25.713 Begin Deserialize Blob
2021-01-10 12:41:33.817 End Deserialize Blob
2021-01-10 12:41:33.817 End Get Blob
You are downloading the whole blob into memory stream (unnecessary extra memory kill), converting to string and then deserializing it. I would rather do it directly from blob stream in one shot leveraging the stream support of Newtonsoft.Json
like below to optimize speed and memory use.
BlobClient blobClient = new BlobClient(ConnectionString, ContainerName, Path);
using (var stream = await blobClient.OpenReadAsync())
using (var sr = new StreamReader(stream))
using (var jr = new JsonTextReader(sr))
{
result = JsonSerializer.CreateDefault().Deserialize<T>(jr);
}
You can also do similar using System.Text.Json
APIs.
JsonSerializerOptions Options = new JsonSerializerOptions();
using (var stream = await blobClient.OpenReadAsync())
{
result = await JsonSerializer.DeserializeAsync<T>(stream , Options);
}