Search code examples
azureazure-storage

Microsoft.Azure.Storage.DataMovement MD5 check


I'm using the Microsoft.Azure.Storage.DataMovement nuget package to transfer multiple, very large (150GB) files into Azure cold storage using TransferManager.UploadDirectoryAsync

It works very well, but a choke point in my process is that after upload I am attaching to the FileTransferred event and reading the local file all over again to calculate the md5 checksum and compare it to the remote copy:

private void FileTransferredCallback(object sender, TransferEventArgs e)
{
    var sourceFile = e.Source.ToString();
    var destinationFile = (ICloudBlob) e.Destination;

    var localMd5 = CalculateMd5(e.Source.ToString());
    var remoteMd5 = destinationFile.Properties.ContentMD5;

    if (localMd5 == remoteMd5)
    {
        destinationFile.Metadata.Add(Md5VerifiedKey, DateTimeOffset.UtcNow.ToDisplayText());
        destinationFile.SetMetadata();
    }
}

It is slower than it needs to be since every file is getting double handled - first by the library, then by my MD5 check.

Is this check even necessary or is the library already doing the heavy lifting for me? I can see Md5HashStream but after quickly looking through the source it isn't clear to me if it is being used to verify the entire remote file.


Solution

  • Note that metadata blob.Properties.ContentMD5 of the entire blob is actually set by Microsoft.Azure.Storage.DataMovement library per its local calculation result after uploading all the blocks of this blob, not by Azure Storage Blob Service.

    The data integrity of blob uploading is guaranteed by Content-MD5 HTTP header when putting every single block, not by metadata blob.Properties.ContentMD5 of the entire blob, since Azure Storage Blob Service doesn't really validate the value when Microsoft.Azure.Storage.DataMovement library is setting metadata (check the introduction of x-ms-blob-content-md5 HTTP header).

    The main purpose of blob.Properties.ContentMD5 is to verify the data integrity when downloading the blob back to local disk via Microsoft.Azure.Storage.DataMovement library (if DownloadOptions.DisableContentMD5Validation is set to false, which is the default behavior).