Search code examples
c#hashunique

Unique, but realistic, object hash code


Ok, I am designing up a piece of software that will keep one system synced with another. The problem is that the originating system is some legacy DB2 nightmare with me only having read-only access and tables having no timestamping capability whatsoever, meaning no way to detect which rows were changed.

My idea is to just load all the rows (in total I will have about 60000 rows, synced every half hour) calculating their hashes, whilst keeping <ID, hash> tuples in my integration database. Then change detection becomes a job of comparing hashes and updating records in destination system where hashes mismatch or tuples missing altogether. Forgot to mention that reading source is cheap, updating destination is expensive, its a web service with a lot of background processing, so I would avoid updating everything every time.

Now, my problem, the c# builtin hashcode claims that its unsuitable for this purpose (equal hash does not imply equal object) and crypto hashes seem like a big overkill with 256+ bit hashes. I don't think more than 64bits is needed, that would give me 1 in 1010 chance of collision given perfectly distributed hash and allow fast hash comparison on x64 arch.

So what should I use to generate unique hashes?


Solution

  • Another option; calculate the hash in C# using a function like this;

    private readonly System.Security.Cryptography.HashAlgorithm hash = System.Security.Cryptography.SHA1.Create();
    
    public static string CalculateSignature(IEnumerable<object> values)
    {
        var sb = new StringBuilder();
        foreach (var value in values)
        {
            string valueToHash = value == null ? ">>null<<" : Convert.ToString(value, CultureInfo.InvariantCulture);
            sb.Append(valueToHash).Append(char.ConvertFromUtf32(0));
        }
        var signature = sb.ToString();
        var bytesToHash = Encoding.UTF8.GetBytes(signature);
        var hashedBytes = hash.ComputeHash(bytesToHash);
        signature = Encoding.UTF8.GetString(hashedBytes);
    
        return signature;
    }
    

    Edit: Hashing profiling test

    To show how fast SHA1 hashing is, here's a quick test. On my dev machine, I get 60,000 hashes in 176ms. MD5 takes 161

    var hash = System.Security.Cryptography.MD5.Create();
    
    var stringtoHash = "3490518cvm90wg89puse5gu3tgu3v0afgmvkldfjgmvvvvvsh,9semc9petgucm9234ucv0[vhd,flhgvzemgu904vq2m0";
    
    var sw = System.Diagnostics.Stopwatch.StartNew();
    for(var i = 0; i < 60000; i++)
    {
        var bytesToHash = Encoding.UTF8.GetBytes(stringtoHash);
        var hashedBytes = hash.ComputeHash(bytesToHash);
        var signature = Encoding.UTF8.GetString(hashedBytes);
    }
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);