Search code examples
phpsimilarity

Deciding which string to go to first / second parameter in similar_text


I have a list of strings. For each string, I need to find the most similar string from another list of strings. Currently, I always pass the string in the first list as the first parameter and the string in the second list as the second parameter in similar_text like this:

foreach($list_a as $str_a){

    $most_similar_str = null;
    $most_similar_str_pct = 0;

    foreach($list_b as $str_b){

        //swapping parameter order may yield a different result
        similar_text($str_a, $str_b, $pct);

        if($pct > $most_similar_str_pct){
            $most_similar_str = $str_b;
            $most_similar_str_pct = pct;
        }
    }

    echo "The most similar text for {$str_a} is {$most_similar_str}\n";

}

Swapping the first and second parameter in similar_text may yield a different result. To produce a more accurate result, I am not sure which string should actually go to the first / second parameter.

I am also thinking about whether I should first find the longer string from $str_a and $str_b and always use it as the first / second parameter.


Solution

  • I've made a lot of experiments with similar_text concerning swapping and I've seen that the most correct results are obtained when the first param is longer the second. In this case strlen will be your friend to decide wich wil be 1st param