Task: I have two columns with product names. I need to find the most similar cell from Column B for Cell A1, then for A2, A3 and so on.
Input:
Col A | Col B
-------------
Red | Blackwell
Black | Purple
White | Whitewater
Green | Reddit
Output:
Red = Reddit / 66% similar
Black = Blackwell / 71% similar
White = Whitewater / 66% similar
Green = Reddit / 30% similar
I think Levenstein Distance can help with sorting, but I don't know how to apply it.
Thanks in advance, any piece of information helps.
<?php
// Arrays of words
$colA = ['Red', 'Black', 'White', 'Green'];
$colB = ['Blackwell', 'Purple', 'Whitewater', 'Reddit'];
// loop through words to find the closest
foreach ($colA as $a) {
// Current max number of matches
$maxMatches = -1;
$bestMatch = '';
foreach ($colB as $b) {
// Calculate the number of matches
$matches = similar_text($a, $b, $percent);
if ($matches > $maxMatches) {
// Found a better match, update
$maxMatches = $matches;
$bestMatch = $b;
$matchPercentage = $percent;
}
}
echo "$a = $bestMatch / " .
number_format($matchPercentage, 2) .
"% similar\n";
}
The first loop iterates through the elements of the first array, for each it initializes the best match found and the number of matching characters on that match.
The inner loop iterates through the array of possible matches looking for the best match, for each candidate it checks the similarities (you could use levenshtein
here instead of similar_text
but the later is convenient because it calculates the percentage for you), if the current word is a better match than the current best match that variable gets updated.
For each word in the outer loop we echo the best match found and the percentage. Format as desired.