Search code examples
stringhashstring-comparisonstring-searchstring-hashing

Is there a hashing function that can be used in finding similar (not necessarily equal) strings?


What I need is a hashing function that operates on fixed data sizes, obviously for non security purposes. It needs to map similar strings to similar or equal hashes, in other words small changes in strings should make no or really small changes to hashes.

for example: my name is John, my name is Jon should have the same or really similar hashes. my name is John, your name is Liam should result in somewhat similar hashes. my name is John, I live in USA should give totally different hashes. and so on!

Is there a hashing function for similar purposes?


Solution

  • There is no reliable way of achieving this. This is due to the pigeonhole principle; there are far fewer ways that two short strings can be "close" than two long strings.

    However, there is the concept of fuzzy hashing, which might get you part of the way there.