string algorithm levenshtein-distance fuzzy-search

Selective edit distance

I have data like

Mega Mall
Mega Malls
L & T Gate 6
L & T Gate 5
L & T Gate 2
Megas Mall
Mega Mwll

Now the thing is I want to clean it up. I took the edit distance approach using edit distance 1 and Mega Mall case is handled. The short coming is it deletes L & T Gate 5,2 also[ I am keeping the first entry]. Is there any way I can handle this, not deleting these cases and handling typos, etc.

Solution

Yes, you can use a weighted form of edit distance, without really changing the algorithm or its time or space complexity. Instead of counting any substitution, insertion or deletion as 1, count it as a higher number when the character (or either of the characters, for a substitution) involved is a digit.

It's even possible to weight specific positions in the string differently. E.g. you might decide that every letter immediately following 1 or more digits should be considered more important (since e.g. the address 123B is very different from 123).