Search code examples
phpregexstrpos

PHP substring matching whole words


I am trying write a StringMatch function that returns true when words from one string can be found in another string. The exception is that I don't want matches for things like plurals and other suffixes, and I would also like to avoid matching when a word is prefixed.

To explain more visually:

apple watch - apple watches (no match)
apple watch - apple watch repairs (match)
apple watch - new apple watch (match)
apple watch - pineapple watch (no match)

I would like is this:

echo StringMatch("apple watch", "apple watches");       // output 0
echo StringMatch("apple watch", "apple watch repairs"); // output 1
echo StringMatch("apple watch", "new apple watch");     // output 1
echo StringMatch("apple watch", "pineapple watch");     // output 0

I have had some basic success with strpos() I cannot figure out how to return "0" when the second string contains suffixes or prefixes as per examples above.

Here is how I'm trying to solve it:

function StringMatch($str1,$str2)
{
    if (SomeFunctionOrRegex($str1,$str2) !== false)
    {
        return(1);
    }
    else
    {
        return(0);
    }
}

Perhaps there is a graceful regex solution. I have tried strpos() but it is not strict enough for my needs.


Solution

  • Like this as I said in the comments

    function StringMatch($str1,$str2)
    {
      return preg_match('/\b'.preg_quote($str1,'/').'\b/i', $str2);
    }
    
    echo StringMatch("apple watch", "apple watches");       // output 0
    echo "\n";
    echo StringMatch("apple watch", "apple watch repairs"); // output 1
    echo "\n";
    echo StringMatch("apple watch", "new apple watch");     // output 1
    echo "\n";
    echo StringMatch("apple watch", "pineapple watch");     // output 0
    echo "\n";
    

    Output:

    0
    1
    1
    0
    

    Sandbox

    Preg Quote in necessary to avoid issues where $str1 could contain things like . which in Regex is any character.

    Furthermore you could strip punctuation like this

    $str1 = preg_replace('/[^\w\s]+/', '', $str1);
    

    For example:

    echo StringMatch("apple watch.", "apple watch repairs"); // output 1
    

    Without removing the punctuation, this will return 0. Rather or not that is important is up to you.

    Sandbox

    UPDATE

    Match out of order, for example:

    //words out of order
    echo StringMatch("watch apple", "new apple watch");     // output 1
    

    The easy way is implode/explode:

    function StringMatch($str1,$str2)
    {
      //use one or the other
      $str1 = preg_replace('/[^\w\s]+/', '', $str1);
      //$str1 = preg_quote($str1,'/');
      $words = explode(' ', $str1);
      preg_match_all('/\b('.implode('|',$words).')\b/i', $str2,     $matches);
      return count($words) == count($matches[0]) ? '1' : '0';
    }
    

    Sandbox

    You can also skip the explode/implode and use

     $str1 = preg_replace('/\s/', '|', $str1);
    

    Which can be combined to the other preg_replace

     $str1 = preg_replace(['/[^\w\s]+/','/\s/'], ['','|'], $str1);
    

    Or all together

    function StringMatch($str1,$str2)
    {
      $str1 = preg_replace(['/[^\w\s]+/','/\s/'], ['','|'], $str1);
      preg_match_all('/\b('.$str1.')\b/i', $str2, $matches);
      return (substr_count($str1, '|')+1) == count($matches[0]) ? '1' : '0';
    }
    

    Sandbox

    But then of course you can't count the words array, but you can count the number of | pipes which is 1 less then the number of words (hence the +1). That is if you care that all the words match.