PHP Stop Word List

I'm playing about with a stop words within my code I have an array full of words that I'd like to check, and an array of words I want to check against.

At the moment I'm looping through the array one at at a time and removing the word if its in_array vs the stop word list but I wonder if there's a better way of doing it, I've looked at array_diff and such however if I have multiple stop words in the first array, array_diff only appears to remove the first occurrence.

The focus is on speed and memory usage but speed more so.

Edit -

The first array is singular words, based on blog comments (these are usually quite long) the second array is singular words of stop words. Sorry for not making that clear

Thanks

Solution

Using str_replace...

A simple approach is to use str_replace or str_ireplace, which can take an array of 'needles' (things to search for), corresponding replacements, and an array of 'haystacks' (things to operate on).

$haystacks=array(
  "The quick brown fox",
  "jumps over the ",
  "lazy dog"
);

$needles=array(
  "the", "lazy", "quick"
);

$result=str_ireplace($needles, "", $haystacks);

var_dump($result);

This produces

array(3) {
  [0]=>
  string(11) "  brown fox"
  [1]=>
  string(12) "jumps over  "
  [2]=>
  string(4) " dog"
}

As an aside, a quick way to clean up the trailing spaces this leaves would be to use array_map to call trim for each element

$result=array_map("trim", $result);

The drawback of using str_replace is that it will replace matches found within words, rather than just whole words. To address that, we can use regular expressions...

Use preg_replace

An approach using preg_replace looks very similar to the above, but the needles are regular expressions, and we check for a 'word boundary' at the start and end of the match using \b

$haystacks=array(
"For we shall use fortran to",
"fortify the general theme",
"of this torrent of nonsense"
);

$needles=array(
  '/\bfor\b/i', 
  '/\bthe\b/i', 
  '/\bto\b/i', 
  '/\bof\b/i'
);

$result=preg_replace($needles, "", $haystacks);