I'm trying to sanitize comments on my page, but I only want to remove html tags etc. from content outside the [code]
[/code]
tags.
As for content inside the tags, I only want to use htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
on.
So if I have a comment that looks like this:
<a>some text</a>
<a>some text</a>
[code]<p>some text</p>[/code]
<div>some text</div>
<div>some text</div>
[code]<p>some text</p>[/code]
<div>hfghgf</div>
<div>some text</div>
My filter looks like this
function sanitize($data) {
$data = trim($data);
$data = strip_tags($data);
$data = htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
return $data;
}
How can I now filter everything outside the [code]
tags with my sanitize()
function and then use only htmlspecialchars()
on content inside the [code]
tags. I also have to account for multiple [code]
tags in one comment.
For your sample input, this seems a bit more direct:
Code: (Demo)
$data=<<<HTML
<a>some text</a>
<a>some text</a>
[code]<p>some text</p>[/code]
<div>some text</div>
<div>some text</div>
[code]<p>some text</p>[/code]
<div>hfghgf</div>
<div>some text</div>
HTML;
$data=strip_tags( // strip any residual tags from the string
preg_replace_callback(
'~\[code].*?\[/code]~is', // match [code]-wrapped substrings
function($m){
return htmlspecialchars($m[0],ENT_QUOTES,'UTF-8'); // convert html entities as intended
},
$data
)
);
var_export($data);
Output:
'some text
some text
[code]<p>some text</p>[/code]
some text
some text
[code]<p>some text</p>[/code]
hfghgf
some text'