I'm using the Microsoft.Azure.Storage.DataMovement nuget package to transfer multiple, very large (150GB) files into Azure cold storage using
TransferManager.UploadDirectoryAsync
It works very well, but a choke point in my process is that after upload I am attaching to the FileTransferred
event and reading the local file all over again to calculate the md5 checksum and compare it to the remote copy:
private void FileTransferredCallback(object sender, TransferEventArgs e)
{
var sourceFile = e.Source.ToString();
var destinationFile = (ICloudBlob) e.Destination;
var localMd5 = CalculateMd5(e.Source.ToString());
var remoteMd5 = destinationFile.Properties.ContentMD5;
if (localMd5 == remoteMd5)
{
destinationFile.Metadata.Add(Md5VerifiedKey, DateTimeOffset.UtcNow.ToDisplayText());
destinationFile.SetMetadata();
}
}
It is slower than it needs to be since every file is getting double handled - first by the library, then by my MD5 check.
Is this check even necessary or is the library already doing the heavy lifting for me? I can see Md5HashStream but after quickly looking through the source it isn't clear to me if it is being used to verify the entire remote file.
Note that metadata blob.Properties.ContentMD5
of the entire blob is actually set by Microsoft.Azure.Storage.DataMovement library per its local calculation result after uploading all the blocks of this blob, not by Azure Storage Blob Service.
The data integrity of blob uploading is guaranteed by Content-MD5 HTTP header when putting every single block, not by metadata blob.Properties.ContentMD5
of the entire blob, since Azure Storage Blob Service doesn't really validate the value when Microsoft.Azure.Storage.DataMovement library is setting metadata (check the introduction of x-ms-blob-content-md5 HTTP header).
The main purpose of blob.Properties.ContentMD5
is to verify the data integrity when downloading the blob back to local disk via Microsoft.Azure.Storage.DataMovement library (if DownloadOptions.DisableContentMD5Validation
is set to false, which is the default behavior).