I would like to use str_replace()
to place span elements around html strings for the purpose of highlighting them.
However the following does not work when there is
inside the string. I've tried replacing the
with ' '
but this did not help.
LIVE example
You can recreate the problem using the below code:
$str_to_replace = "as a way to incentivize more purchases.";
$replacement = "<span class='highlighter'>as a way to incentivize more purchases.</span>";
$subject = file_get_contents("http://venturebeat.com/2015/11/10/sources-classpass-raises-30-million-from-google-ventures-and-others/");
$output = str_replace($str_to_replace,$replacement,$subject);
.highlighter{
background-collor: yellow;
}
So I tried your code and ran into the same problem you did. Interesting, right? The problem is that there's actually another character inbetween the "e" in "incentivize" and the " more", you can see it if you do this, split $subject
into two parts, preceding the text to incentivize
and after:
// splits the webpage into two parts
$x = explode('to incentivize', $subject);
// print the char code for the first character of the second string
// (the character right after the second e in incentivize) and also
// print the rest of the webpage following this mystery character
exit("keycode of invisible character: " . ord($x[1]) . " " . $x[1]);
which prints: keycode of invisible character: 194 Â more ...
, look! There's our mystery character, and it has charcode 194!
Perhaps this website embeds these characters to make it difficult to do exactly what you're doing, or perhaps it's just a bug. In any case, you can use preg_replace
instead of str_replace
and change $str_to_replace
like so:
$str_to_replace = "/as a way to incentivize(.*?)more purchases/";
$replacement = "<span class='highlighter'>as a way to incentivize more purchases.</span>";
$subject = file_get_contents("http://venturebeat.com/2015/11/10/sources-classpass-raises-30-million-from-google-ventures-and-others/");
$output = preg_replace($str_to_replace,$replacement,$subject);
and now this does what you want. The (.*?)
handles the mysterious hidden character. You can probably shrink this regular expression even further or at least cap it at a maximum amount of characters ([.]{0,5})
but in either case you likely want to stay flexible.