Search code examples
phpstring-comparisonsimilarity

PHP nearest string comparison


Possible Duplicate:
String similarity in PHP: levenshtein like function for long strings

I have my subject string

$subj = "Director, My Company";

and a list of multiple strings to be compared:

$str1 = "Foo bar";
$str2 = "Lorem Ipsum";
$str3 = "Director";

What I want to achieve here is to find the nearest string related to $subj. Is it possible to do it?


Solution

  • The levenshtein() function will do what you expect. The Levenshtein algorithm calculates the number of insert and replace actions being required to transform some string into another. The result is called an edit distance. The distance can be used to compare strings as you requested.

    This example is derived from the documentation of the PHP levenshtein() function.

    <?php
    
    $input = 'Director, My Company';
    
    // array of words to check against
    $words  = array('Foo bar','Lorem Ispum','Director');
    
    // no shortest distance found, yet
    $shortest = -1;
    
    // loop through words to find the closest
    foreach ($words as $word) {
    
        // calculate the distance between the input word,
        // and the current word
        $lev = levenshtein($input, $word);
    
        // check for an exact match
        if ($lev == 0) {
    
            // closest word is this one (exact match)
            $closest = $word;
            $shortest = 0;
    
            // break out of the loop; we've found an exact match
            break;
        }
    
        // if this distance is less than the next found shortest
        // distance, OR if a next shortest word has not yet been found
        if ($lev <= $shortest || $shortest < 0) {
            // set the closest match, and shortest distance
            $closest  = $word;
            $shortest = $lev;
        }
    }
    
    echo "Input word: $input\n";
    if ($shortest == 0) {
        echo "Exact match found: $closest\n";
    } else {
        echo "Did you mean: $closest?\n";
    }
    

    Scripts output is

    Input word: Director, My Company
    Did you mean: Director?
    

    Good Luck!