I have a program that currently hashes files using just SHA1. No other options. It hashes them using the SHA1 hash function that's part of the Lazarus and Free Pascal Compiler.
I've since added the ability to use MD5, SHA256 and SHA512 by using the DCPCrypt library (http://wiki.lazarus.freepascal.org/DCPcrypt or http://www.cityinthesky.co.uk/opensource). Everything is working fine, however, my earlier version hashed the file in 2Mb buffers if the file was larger than 1Mb. If it was smaller than 1Mb, it used the default buffer of 1024 bytes, like this :
if SizeOfFile > 1048576 then // if > 1Mb
begin
fileHashValue := SHA1Print(SHA1File(NameOfFileToHash, 2097152)); //2Mb buffer
end
else
fileHashValue := SHA1Print(SHA1File(NameOfFileToHash)); //1024 byte buffer
However, my hashing functions and procedures have now been moved to a single function controlled by a Radio button status to make my code more object orientated. It basically has all 4 hashing options coded within it, and which section is ran depends on which RadioButton.Checked status the program finds. The code of SHA1, for example, now looks like this :
..
SourceData := TFileStream.Create(FileToBeHashed, fmOpenRead);
..
else if SHA1RadioButton2.Checked = true then
begin
varSHA1Hash := TDCP_SHA1.Create(nil);
varSHA1Hash.Init;
varSHA1Hash.UpdateStream(SourceData, SourceData.Size); // HOW DO I ADD A BUFFER HERE?
varSHA1Hash.Final(DigestSHA1);
varSHA1Hash.Free;
for i := 0 to 19 do // 40 character output
GeneratedHash := GeneratedHash + IntToHex(DigestSHA1[i],2);
end // End of SHA1 if
My question is how do I add a buffer size to varSHA1Hash.UpdateStream if the file found is 'large' (say, bigger than 1Mb)? This is important because a 300Mb file, for example, takes 4 seconds with my earlier version and now it takes 9 seconds with my 'improved' version that utilises the DCPCrypt library! So it has doubled the time it takes for large files even though my code reads much better. If I can get varSHA1Hash.UpdateStream to read in data of several Mb at a time instead of 8k byte buffers (which the procedure UpdateStream does, if you read the code library) it will make it faster. As it stands, my understanding is that varSHA1Hash.UpdateStream(SourceData, SourceData.Size); basically reads the entire size of the file being read as the buffer?
If it helps, here is the UpdateStream procedure from
procedure TDCP_hash.UpdateStream(Stream: TStream; Size: longword);
var
Buffer: array[0..8191] of byte;
i, read: integer;
begin
dcpFillChar(Buffer, SizeOf(Buffer), 0);
for i:= 1 to (Size div Sizeof(Buffer)) do
begin
read:= Stream.Read(Buffer,Sizeof(Buffer));
Update(Buffer,read);
end;
if (Size mod Sizeof(Buffer))<> 0 then
begin
read:= Stream.Read(Buffer,Size mod Sizeof(Buffer));
Update(Buffer,read);
end;
end;
I have also looked at some other libraries, such as Delphi Encryption Compedium (http://home.netsurf.de/wolfgang.ehrhardt/crchash_en.html) and Wolfgang Ehrhardt library (http://www.torry.net/pages.php?id=519#939342) and also the one that is included with DoubleCommander, but for varios reasons (simplicty being one) I am trying to do this using DCPCrypt.
To answer your question: you cannot pass a different size but you can change the array size in dcpcrypt2.pas in the method you mentioned and recompile DCPCrypt, it is OSS after all.
But this will not help much because the sha1 unit of fpc is not faster because of the larger buffer size but because of a faster implementation of the sha1 algorithm, it makes use of the compiler intrinsics to rotate values which is an heavily used operation of the sha1 algorithm.
Just the following program with different numerical command line parameters (e.g. 8192 and 8388608):
uses
sysutils,sha1;
begin
writeln(SHA1Print(SHA1File('bigfile',StrToInt(paramstr(1)))));
end.
At least on my PC it makes no difference if the buffer is 8k or 8M. If you use lower values like 1024, you will see a slight slow down (10-20%).