How can match similar words in array_diff count
Problem of multiple name for single words like TV-Television,Inches-Inch,Mobile-Mobile Phones,Mobile-Phones.So create wrong percentage in array_diff count
Example :
$str1 = "Samsung Television 21 Inches LED BH005DE";
$str2 = "Samsung 21 Inch LED TV";
$arr1 = explode(' ', $str1);
$arr2 = explode(' ', $str2);
$differenceCount = count(array_diff($arr2, $arr1));
In above str1 and str2 contain Television,TV and Inches,Inch words..How can solve this problem
The most obvious way is to use synonyms for that:
$str1 = "Samsung Television 21 Inches LED BH005DE";
$str2 = "Samsung 21 Inch LED TV";
//synonyms:
$syns = [
'TV' => ['TV', 'Television'],
'Inch' => ['Inch', 'Inches']
];
//replace:
$str1 = array_reduce(array_keys($syns), function($c, $x) use ($syns)
{
return $c = preg_replace('/\b'.join('\b|\b', $syns[$x]).'\b/', $x, $c);
}, $str1);
//now, str1 looks like "Samsung TV 21 Inch LED BH005DE"
$str2 = array_reduce(array_keys($syns), function($c, $x) use ($syns)
{
return $c = preg_replace('/\b'.join('\b|\b', $syns[$x]).'\b/', $x, $c);
}, $str2);
//now, str2 looks like "Samsung 21 Inch LED TV"
$arr1 = explode(' ', $str1);
$arr2 = explode(' ', $str2);
//var_dump(array_diff($arr1, $arr2));//['BH005DE']
In your case you can't rely on some sort of word forms (like Inch
- Inches
) - because you need to parse abbreviations too - and those are cases with specific meanings. Thus, using synonyms may be the only way to resolve the matter for all cases.