I'm writing a c# app to download Google Drive files using their V3 APIs, and checking the MD5 hash supplied by Google to confirm the download. All is going well, and the app is working, except I'm getting a 75+ percent failure rate checking the MD5 hash when files are greater than 2GB in size. Some work, most don't.
If I check with a 3rd party MD5 utility, it gives the correct hash (same as Google Drive). I've tried downloading separate to my app (ie. through the browsers), just in case my app is doing something weird with the download, but that also fails when checking the md5 hash through my app. So it's clearly something happening at my end.
I'm using the c# System.Security.MD5 library, using TransformBlock and TransformFinalBlock. I've tried different buffer sizes, just for fun, but no luck. I did also try the full file - ComputeHash(Stream) - but this fails as well).
The only thing I can see (as a complete grasping at straws) is that the inputOffset and inputCount parameters are int, which could account for the 2GB limit if these functions have an internal "total file size" or similar which is also an int (32 bit signed - assumed).
The other thing I am noticing is that the process will pause every 8-25%, with no CPU, disk, RAM, garbage collection, or other activity for anywhere up to a couple of minutes before it continues. When it's "running" I see disk, CPU, etc., as expected, and progress goes reasonably quickly. This pause doesn't seem to affect whether the final hash is "successful" or not, but may be related (I see it on largish files under 2GB as well).
Does anyone know if this is an issue? I've seen a couple of people asking similar questions on issues with large file hashes, but with the unhelpful answer that hashes should always be the same... Yes, they should, but it appears they might not be. The weirdest thing is the occasional hash works on a large file.
Below is a simplification of the code (error checking, progress reporting, etc. taken out for quick readability - and yes, I've tried this simplified code as well - same issue). Not the cleanest, but it works (aside from >2GB files). Thanks in advance for any advice or knowledge of the issue.
int buffersize = 65536;
using (var md5 = MD5.Create())
{
using (var stream = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, buffersize))
{
var block = new byte[buffersize];
int length = 0;
Int64 filesize = stream.Length;
Int64 bytesread = 0;
length = stream.Read(block, 0, buffersize);
bytesread += length;
while (length == block.Length)
{
md5.TransformBlock(block, 0, length, null, 0);
length = stream.Read(block, 0, buffersize);
bytesread += length;
}
md5.TransformFinalBlock(block, 0, length);
bytesread += length;
}
var hash = md5.Hash;
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
For absolutely no known reason, it now works. As best as I can guess (and it's completely theoretical and shouldn't happen), I was working on other parts of the app, and this shifted the location of something to avoid whatever was wrong...
It still pauses here and there for no known reason, but hashes are now returning correctly (matching when they should, not matching when they shouldn't).