I have the following use case:
Incrementally hashing a file isn't the problem, just call TransformBlock
and TransformFinalBlock
.
The problem is that I need multiple hashes of data that shares its beginning bytes, but after I have called TransformFinalBlock
to read the Hash
of the first n
bytes I cannot continue to hash with the same object and need a new one.
Searching for the problem, I saw that both Python as well as OpenSSL have an option to copy a hashing object for exactly this purpose:
hash.copy()
Return a copy (“clone”) of the hash object. This can be used to efficiently compute the digests of strings that share a common initial substring.
EVP_MD_CTX_copy_ex() can be used to copy the message digest state from in to out. This is useful if large amounts of data are to be hashed which only differ in the last few bytes. out must be initialized before calling this function.
Searching as I may, I can't find anything withing the stock C# HashAlgorithm that would allow me to effectively Clone()
== copy such an object before calling its TransformFinalBlock
method -- and afterwards continue to hash the rest of the data with the clone.
I found a C# reference implementation for MD5 that could be trivially adapted to support cloning(*) but would strongly prefer to use what is there instead of introducing such a thing into the codebase.
(*) Indeed, as far as I understand, any Hashing Algorithm (as opposed to encryption/decryption) I've bothered to check is trivially copyable because all the state such an algorithm has is a form of a digest.
So am I missing something here or does the standard C#/.NET interface in fact not offer a way to copy the hash object?
Another data point:
Microsoft's own native API for crypto services has a function CryptDuplicateHash
, the docs of which state, quote:
The CryptDuplicateHash function can be used to create separate hashes of two different contents that begin with the same content.
Been around since Windows XP. :-|
Note wrt. MD5: The use case is not cryptographically sensitive. Just reliable file checksumming.
SIGH
The stock .NET library does not allow this. Sad. Anyways, there are a couple of alternatives:
MD5Managed
pure .NET ("default" MD5 RSA license)ClonableHash
that wraps the MS Crypto API via PInvoke (may need some work extracting that from the Org.Mentalis
namespace, but the license is permissive)It is also possible to for example wrap a C++ implementation in a C++/CLI wrapper - preliminary tests have shown that this seems to be way faster than the normal .NET library, but don't take my word on it.
Since, I also wrote/adapted a C++ based solution myself: https://github.com/bilbothebaggins/md5cpp
It hasn't gone into production, because the requirements changed, but it was a nice exercise and I like to think it works quite well. (Other than it not being a pure C# implementation.)