I'm trying to program a search function that hightlights the search query in the result. At the moment I'm using this Code $hightlight = preg_replace('/'.strtolower($query).'/', '<span class=hightlight>'.strtolower($query).'</span>', strtolower($text));
for highlighting, which works fine. The text I'm searching in is a string from a database. The problem now is if the text contains some html special characters, and is for example <test>
and the user searches for <te
I get the following result: <span class="hightlight"><te< span="">st></te<></span>
which is interpretated as st>
. This makes sense, but I don't want this. I want <test>
as result with <te
highlighted. So I need to escape the special characters. I know that there is the function htmlspecialchars
, but how can I use it in this case? Or another function? I can't escape them before searching, because than I'm also searching in the HTML-Codes. I also can't escape them after searching, because than are the <span>
Tags in the text and they will also be converted to HTML-Codes. I hope you understand my problem. Has anyone a solution for that?
Using a combination of htmlspecialchars()
and a regex negative lookahead, I think we're able to solve this.
<php
$text = "this is just my really basic <test> of words";
$query = "<te";
$text = htmlspecialchars($text);
$query = htmlspecialchars($query);
$highlight = preg_replace('/'.strtolower($query).'(?![^\&]*\;)/', '<span class=highlight>'.strtolower($query).'</span>', strtolower($text));
echo $highlight;
?>
(small note, I took the liberty of changing hightlight
to highlight
)
The part of this that solves the issue mentioned in your comment is the negative lookahead: (?![^\&]*\;)
That basically means anything not between &
and ;
.
Now, this could obviously run into issues in some edge cases where &
and ;
are both part of the actual text. If you're not doing any sort of text and query limitation/sanitation, I'm not sure that there's anything that will work for all possible cases.