Search code examples
phpmysqlregexedit-distance

Generate regular expression for given string and edit distance


I have the problem that I want to match all strings in the database having a certain edit distance to a given string.

My idea was to generate a regular expression that would match all strings with edit distance d to string s.

So for example I want to generate a regex r for d = 1 and s = 'abc' in the form of: r = 'abc|.abc|.bc|a.c|ab.|abc.' and so on. But I'm not sure if this is very efficient or are there already some good algorithms to that problem? I want to consider even character swaps in the edit distance. so 'acb' should also be part of r. I want to realise it in PHP and then make an SQL query: SELECT * FROM table WHERE name RLIKE TheRegularExpression.

Is it a good way to make it like that? Or what would you recommend?


Solution

  • Probably the best thing to do is build up an iterative process for all the possibilities. In other words, something like this:

    function findall($startString) {
        // create an array of all strings that are distance one away
        // each element would be $returnArray["abc"] = "abc";
    }
    
    $d = 2; // distance
    $myArray[$startString] = $startString;
    
    for($i = 0; $i < $d; $i++) {
        $newCombos = array_merge(array(), $myArray);
        foreach($myArray as $element) {
            $newCombos = array_merge($newCombos, findall($element));
        }
        $myArray = array_merge(array(), $newCombos);
    }
    
    $myRegex = implode("|", $myArray);