Search code examples
pythonsimilaritylevenshtein-distancedifflib

Finding similar strings with restricted alpha characters using Python


I want to group similar strings, however, I would prefer to be smart to catch whether conventions like '/' or '-' are diverged instead of letter differences.

Given following input:

moose
mouse
mo/os/e
m.ouse

alpha = ['/','.']

I want to group strings with respect to restricted set of letters, where output should be:

moose
mo/os/e

mouse
m.ouse

I'm aware I can get similar strings using difflib but it doesn't provide option for limiting the alphabet. Is there another way of doing this? Thank you.

Update:

Instead of restricted letters, alphas are simpler to implement by just checking for occurrences. Therefore, I've changed the title.


Solution

  • Maybe something like:

    from collections import defaultdict
    
    container = defaultdict(list)
    for word in words:
        container[''.join(item for item in word if item not in alpha)].append(word)