Search code examples

Sanitize content outside of [code] tags with one filter and content inside [code] tags with another filter

I'm trying to sanitize comments on my page, but I only want to remove html tags etc. from content outside the [code] [/code] tags.

As for content inside the tags, I only want to use htmlspecialchars($data, ENT_QUOTES, 'UTF-8'); on.

So if I have a comment that looks like this:

<a>some text</a>
<a>some text</a>
[code]<p>some text</p>[/code]
<div>some text</div>
<div>some text</div>
[code]<p>some text</p>[/code]
<div>some text</div>

My filter looks like this

function sanitize($data) {
    $data = trim($data);
    $data = strip_tags($data);
    $data = htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
    return $data;

How can I now filter everything outside the [code] tags with my sanitize() function and then use only htmlspecialchars() on content inside the [code] tags. I also have to account for multiple [code] tags in one comment.


  • For your sample input, this seems a bit more direct:

    Code: (Demo)

    <a>some text</a>
    <a>some text</a>
    [code]<p>some text</p>[/code]
    <div>some text</div>
    <div>some text</div>
    [code]<p>some text</p>[/code]
    <div>some text</div>
    $data=strip_tags(                                                   // strip any residual tags from the string
                '~\[code].*?\[/code]~is',                               // match [code]-wrapped substrings
                    return htmlspecialchars($m[0],ENT_QUOTES,'UTF-8');  // convert html entities as intended


    'some text
    some text
    [code]&lt;p&gt;some text&lt;/p&gt;[/code]
    some text
    some text
    [code]&lt;p&gt;some text&lt;/p&gt;[/code]
    some text'