Search code examples
phparraysarray-difference

PHP: array_diff count issue due to multiple similar name


How can match similar words in array_diff count

Problem of multiple name for single words like TV-Television,Inches-Inch,Mobile-Mobile Phones,Mobile-Phones.So create wrong percentage in array_diff count

Example :

    $str1 = "Samsung Television 21 Inches LED BH005DE";
    $str2 = "Samsung 21 Inch LED TV";

    $arr1 = explode(' ', $str1);
    $arr2 = explode(' ', $str2);

    $differenceCount = count(array_diff($arr2, $arr1));

In above str1 and str2 contain Television,TV and Inches,Inch words..How can solve this problem


Solution

  • The most obvious way is to use synonyms for that:

    $str1 = "Samsung Television 21 Inches LED BH005DE";
    $str2 = "Samsung 21 Inch LED TV";
    
    //synonyms:
    $syns = [
       'TV'   => ['TV', 'Television'],
       'Inch' => ['Inch', 'Inches']
    ];
    
    //replace:
    $str1 = array_reduce(array_keys($syns), function($c, $x) use ($syns)
    {
       return $c = preg_replace('/\b'.join('\b|\b', $syns[$x]).'\b/', $x, $c);
    }, $str1);
    //now, str1 looks like "Samsung TV 21 Inch LED BH005DE"
    
    $str2 = array_reduce(array_keys($syns), function($c, $x) use ($syns)
    {
       return $c = preg_replace('/\b'.join('\b|\b', $syns[$x]).'\b/', $x, $c);
    }, $str2);
    //now, str2 looks like "Samsung 21 Inch LED TV"
    
    $arr1 = explode(' ', $str1);
    $arr2 = explode(' ', $str2);
    
    
    //var_dump(array_diff($arr1, $arr2));//['BH005DE']
    

    In your case you can't rely on some sort of word forms (like Inch - Inches) - because you need to parse abbreviations too - and those are cases with specific meanings. Thus, using synonyms may be the only way to resolve the matter for all cases.