Search code examples
c#asp.netazureazure-storage

ContentHash not calculated in Azure Blob Storage v12


Continuing the saga, here is part I: ContentHash is null in Azure.Storage.Blobs v12.x.x

After a lot of debugging, root cause appears to be that the content hash was not calculated after uploading a blob, therefore the BlobContentInfo or BlobProperties were returning a null content hash and my whole flow is based on receiving the hash from Azure.

What I've discovered is that it depends on which HttpRequest stream method I call and upload to azure:

HttpRequest.GetBufferlessInputStream(), the content hash is not calculated, even if I go into azure storage explorer, the ContentMD5 of the blob is empty.

HttpRequest.InputStream() everything works as expected.


Do you know why this different behavior? And do you know how to make to receive content hash for streams received by GetBufferlessInputStream method.

So the code flow looks like this:

var stream = HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true)

var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlockBlobClient(blobPath);

BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
     blobHttpHeaders = new BlobHttpHeaders()
     {
          ContentType = fileContentType,
     };
}

// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);

return await blob.GetPropertiesAsync();

In the code snippet from above ContentHash is NOT calculated, but if I change the way I am getting the stream from the http request with following snippet ContentHash is calculated.

var stream = HttpContext.Current.Request.InputStream

P.S. I think its obvious, but with the old sdk, content hash was calculated for streams received by GetBufferlessInputStream method

P.S2 you can find also an open issue on github: https://github.com/Azure/azure-sdk-for-net/issues/14037

P.S3 added code snipet


Solution

  • A workaround is that when get the stream via GetBufferlessInputStream() method, convert it to MemoryStream, then upload the MemoryStream. Then it can generate the contenthash. Sample code like below:

            var stream111 = System.Web.HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true);
            //convert to memoryStream.
            MemoryStream stream = new MemoryStream();
            stream111.CopyTo(stream);
            stream.Position = 0;
    
            //other code
            // retry already configured of Azure Storage API
            await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
    

    Not sure why, but as per my debug, I can see when using the method GetBufferlessInputStream() in the latest SDK, during upload, it actually calls the Put Block api in the backend. And in this api, MD5 hash is not stored with the blob(Refer to here for details.). Screenshot as below:

    enter image description here

    However, when using InputStream, it calls the Put Blob api. Screenshot as below:

    enter image description here