Every day I get several excel files from different companies with product information and the only way to match the products to my own database is via the product name. These companies are not organized, the product names are typed by hand and may vary in the same excel file.
What is the best way to find the closest match to my own product list? Sometimes company A might call a product "Toy car 100" where I don't care about which number it is, I simply call this "Toy car". However sometimes they may call it something different like "Provision for toy that is a car" and I then need to match that to my "Toy car" product.
What is the best way to do this matching of strings, basically I want to match strings that are the most similar.
My current implementation involves writing many if statements like this, creating new if statements when a variation can not be matched correctly.
foreach ($prodset as $p) {
if (strpos(strtolower($dd['offer_name']), strtolower($p['prod_info'])) !== false && $p['active'] == 1) {
$dd['product_id'] = $p['prod_id'];
$result = $dd;
return $result;
}
}
You can use similar_text function, which calculates the similarity between two strings.
<?php
$base = 'Toy car';
$variations = array(
'Provision for toy that is a car',
'Toy that looks like a car',
'Toy Car',
'Toy CAR'
);
foreach($variations as $variation) {
echo "{$base} and {$variation} = " . similar_text($base, $variation);
}
Thus it will match characters in case-sensitive way, you can strtolower
both strings for better results.
Note that complexity is O(N**3)
where N is the length of the longest string.