I am looking for efficient way of calculating hash of big files (3GB) and realized that calling Windows certutil with argument -hashfile
perform hash computation 4 time faster (16 sec) than doing it via SHA512.Create().ComputeHash
(~60 sec) and I am not really understand such a big differences.
I was trying to 'play' with reading buffer size of FileStream
but it gives me about 2 sec so not really important optimization.
1) Doing hash computation by ComputeHash:
var sw = Stopwatch.StartNew();
using (var fs = new FileStream(@"C:\Temp\BigFile.dat", FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 1024 * 1024 * 10, options: FileOptions.SequentialScan))
{
Console.WriteLine(BitConverter.ToString(SHA512.Create().ComputeHash(fs)));
}
Console.WriteLine(sw.ElapsedMilliseconds);
2) Doing it via certutil
var sw = Stopwatch.StartNew();
var fPath = @"C:\Temp\BigFile.dat";
var p = Process.Start(new ProcessStartInfo(@"certutil", $"-hashfile \"{fPath}\" SHA512") { RedirectStandardOutput = true, UseShellExecute=false});
p.WaitForExit();
Console.WriteLine(p.StandardOutput.ReadToEnd());
Console.WriteLine(sw.ElapsedMilliseconds);
I would expect some difference in time due to managed/native nature of code but 16 sec vs 60 looks a bit confusing.
At this question it is also mentioned that native code works much faster but the are no explanation of this difference and I am mostly interested in understanding the difference. I was thinking that in nowadays it shouldnt be such a big difference in such simple example (IO + math?)
There are three implementations of SHA512 in .NET (and the default depends on CryptoConfig.AllowOnlyFipsAlgorithms
property):
It could be that those implementations have different performances. Call SHA512.Create(string)
with each of those strings, and compare results.