I have the problem that I want to match all strings in the database having a certain edit distance to a given string.
My idea was to generate a regular expression that would match all strings with edit distance d
to string s
.
So for example I want to generate a regex r
for d = 1
and s = 'abc'
in the form of: r = 'abc|.abc|.bc|a.c|ab.|abc.'
and so on. But I'm not sure if this is very efficient or are there already some good algorithms to that problem? I want to consider even character swaps in the edit distance. so 'acb'
should also be part of r
. I want to realise it in PHP and then make an SQL query: SELECT * FROM table WHERE name RLIKE TheRegularExpression
.
Is it a good way to make it like that? Or what would you recommend?
Probably the best thing to do is build up an iterative process for all the possibilities. In other words, something like this:
function findall($startString) {
// create an array of all strings that are distance one away
// each element would be $returnArray["abc"] = "abc";
}
$d = 2; // distance
$myArray[$startString] = $startString;
for($i = 0; $i < $d; $i++) {
$newCombos = array_merge(array(), $myArray);
foreach($myArray as $element) {
$newCombos = array_merge($newCombos, findall($element));
}
$myArray = array_merge(array(), $newCombos);
}
$myRegex = implode("|", $myArray);