I need to anonymyze personal data in our MySql database. The problem is that I still need to be able to link two persons together after they have been anonymized.
I thought this could be done by hashing their social security number or e-mail address, which lead to my question:
When hashing two equal strings (s1 and s1) I get two hash values (h1 and h2), how sure can I be that:
1) the hashed value is equal (h1 = h2)
2) no not equal (s3 = s1) will produce the same hash value
1) Same strings will always produce equal hash values
2) Different strings theoretically might produce same hash if you choose small hash length compared to data volume. But using default hash lengths (32 or 40) wont cause such problems.