Search code examples
c#algorithmhashperfect-hash

How to generate an unique identifier for the address structure?


I have a structure which describes the address, it looks like:

class Address
{
    public string AddressLine1 { get; set; }
    public string AddressLine2 { get; set; }
    public string City { get; set; }
    public string Zip { get; set; }
    public string Country { get; set; }
} 

I'm looking for a way to create an unique identifier for this structure (I assume it should be also of a type of string) which is depend on all the structure properties (e.g. change of AddressLine1 will also cause a change of the structure identifier).

I know, I could just concatenate all the properties together, but this gives too long identifier. I'm looking for something significantly shorter than this.

I also assume that the number of different addresses should not be more than 100M.

Any ideas on how this identifier can be generated?

Thanks in advance.

A prehistory of this:

There are several different tables in the database which hold some information + address data. The data is stored in the format similar to the one described above.

Unfortunately, moving the address data into a separate table is very costly right now, but I hope it will be done in the future.

I need to associate some additional properties with the address data, and going to create a separate table for this. That's why I need to unique identify the address data.


Solution

  • Serialize all fields to a large binary value. For example using concatenation with proper domain separation.

    Then hash that value with a cryptographic hash of sufficient length. I prefer 256 bits, but 128 are probably fine. Collisions are extremely rare with good hashes, with a 256 bit hash like SHA-256 they're practically impossible.