I've made an extension method for the BlockBlobClient, to get a specific block by block id, and I want to know if this code snippet can be tweaked to increase performance/other things
public static async Task<T> GetBlockByIdAsync<T>(this BlockBlobClient blockBlobClient, string blockId, CancellationToken cancellationToken)
{
var blockListResponse = await blockBlobClient.GetBlockListAsync(cancellationToken: cancellationToken);
var blockList = blockListResponse.Value.CommittedBlocks.ToList();
var currentBlock = blockList.FirstOrDefault(a => a.Name == blockId);
if (currentBlock.Name == null)
{
throw new InvalidOperationException($"Could not find BlockId {blockId}");
}
var length = currentBlock.SizeLong;
var index = blockList.FindIndex(a => a.Name == blockId);
var offset = 0L;
for (var i = 0; i < index; i++)
{
offset += blockList[i].SizeLong;
}
var options = new BlobDownloadOptions()
{
Range = new HttpRange(offset, length)
};
var blockInfo = await blockBlobClient.DownloadStreamingAsync(options, cancellationToken);
return JsonSerializer.Deserialize<T>(blockInfo.Value.Content);
}
The only thing that really stands out is the multiple iteration (3 times) of blockList
.
blockList.FirstOrDefault(a => a.Name == blockId)
blockList.FindIndex(a => a.Name == blockId)
for (var i = 0; i < index; i++)
You can do this with a single iteration, something like this.
public static async Task<T> GetBlockByIdAsync<T>(this BlockBlobClient blockBlobClient, string blockId, CancellationToken cancellationToken)
{
var blockListResponse = await blockBlobClient.GetBlockListAsync(cancellationToken: cancellationToken);
var blockList = blockListResponse.Value.CommittedBlocks.ToList();
var length = 0L;
var offset = 0L;
// iterate over all blocks until we find the block we want.
foreach (var block in blockList){
if (block.Name == blockId){
length = block.SizeLong;
break;
}
// We haven't found the block we want yet so update the offset.
offset += block.SizeLong;
}
// Check if we found the block we were looking for.
if (length == 0)
{
throw new InvalidOperationException($"Could not find BlockId {blockId}");
}
var options = new BlobDownloadOptions()
{
Range = new HttpRange(offset, length)
};
var blockInfo = await blockBlobClient.DownloadStreamingAsync(options, cancellationToken);
return JsonSerializer.Deserialize<T>(blockInfo.Value.Content);
}
CAVEAT I haven't run this code locally or even tried to compile it so there may be errors but the idea is sound.
UPDATE There was an iteration that I missed: blockListResponse.Value.CommittedBlocks.ToList();
Granted, calling to list prevents any repeated calls to CommittedBlocks
but this could be even more efficient.
Assuming blockListResponse.Value.CommittedBlocks
returns IEnumerable` you could even remove this...
public static async Task<T> GetBlockByIdAsync<T>(this BlockBlobClient blockBlobClient, string blockId, CancellationToken cancellationToken)
{
var blockListResponse = await blockBlobClient.GetBlockListAsync(cancellationToken: cancellationToken);
var blockList = blockListResponse.Value.CommittedBlocks;
var length = 0L;
var offset = 0L;
// iterate over all blocks until we find the block we want.
var blockListEnumerator = blockList.GetEnumerator();
while (blockListEnumerator.MoveNext())
{
var block = blockListEnumerator.Current as BlobBlock;
if (block.Name == blockId){
length = block.SizeLong;
break;
}
// We haven't found the block we want yet so update the offset.
offset += block.SizeLong;
}
// Check if we found the block we were looking for.
if (length == 0)
{
throw new InvalidOperationException($"Could not find BlockId {blockId}");
}
var options = new BlobDownloadOptions()
{
Range = new HttpRange(offset, length)
};
var blockInfo = await blockBlobClient.DownloadStreamingAsync(options, cancellationToken);
return JsonSerializer.Deserialize<T>(blockInfo.Value.Content);
}