Search code examples
phplevenshtein-distance

How can I generate all variants of a word within 1-edit distance (Levenshtein)?


I would like to generate all variants of a word within 1-edit distance using Levenshtein distance.

PHP has a function that will take two strings as parameter and will return just the number (int) of insert, replace and delete operations needed to transform str1 into str2. PHP Manual - levenshtein

int levenshtein ( string $str1 , string $str2 )

I am looking for a PHP solution to create an algorithm that generates the variants of a given word.


Solution

  • This is pretty easy for distance 1. Generating all possibilities for distances > 1 becomes somewhat more complex.

    Start with a word:

    $input = 'word';
    

    Split the word into letters and generate a list of replacements.

    $letters = str_split($input);
    
    $alphabet = range('a', 'z');
    

    Deletions are the easiest, just loop over each position and replace with '':

    foreach ($letters as $i => $letter) {
        $variants[] = substr_replace($input, '', $i, 1);
    }
    

    Insertions and replacements can be done at the same time, because they'll both require a loop over the letters in the input nested inside a loop over the alphabet.

    foreach ($alphabet as $variation) {
        foreach ($letters as $i => $letter) {
    
            // insertion
            $variants[] = substr($input, 0, $i) . $variation . substr($input, $i);
    
            // substitution
            // (check that the letter is different or you'll get multiple copies of the input)
            if ($variation != $letter) {
                $variants[] = substr_replace($input, $variation, $i, 1);
            }
        }
        $variants[] = $input . $variation; // handle insertion at the end
    }
    

    You can check the results to verify the levenshtein distances are correct:

    foreach ($variants as $variant) {
        $result[$variant] = levenshtein($input, $variant);
    }