Search code examples
phparrayssimilarity

Finding similar strings in array


I need to harness similar_text() for an array of values that look something like this:

$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];

What I'm trying to do is find the words what are practically the same, i.e. lawyer and lawyers in the above array, and add the counts for them together in a new array.

So lawyer would be 4 as lawyers would be associated to the original string of lawyer.

Keep in mind, this array will only ever be singular words and the length is unspecified, it could range from 1 to >99.

I had no idea where to start with this, so I gave it a crack with a foreach loop as you'll see below, but the intended output isn't as expected.

foreach ( $strings as $key_one => $count_one ) {
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            if(!isset($counts[$key_one])) {
                $counts[$key_one] = $count_one;
            } else {
                $counts[$key_one] += $count_two;
            }
        }
    }
}

Note: The percent match is at 80 for this example (as the match for lawyer & lawyers is ~92%)

Which ends up giving me something similar to the following:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
    [lawyers] => 2
)

Where I require it to be:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
)

Notice how i require it to practically remove lawyers and add the count to lawyer.


Solution

  • You can always use

    unset( $counts[$key_two] ) ;