Search code examples
phpreplacebbcodehtmlspecialcharsstrip-tags

Sanitize content outside of [code] tags with one filter and content inside [code] tags with another filter


I'm trying to sanitize comments on my page, but I only want to remove html tags etc. from content outside the [code] [/code] tags.

As for content inside the tags, I only want to use htmlspecialchars($data, ENT_QUOTES, 'UTF-8'); on.

So if I have a comment that looks like this:

<a>some text</a>
<a>some text</a>
[code]<p>some text</p>[/code]
<div>some text</div>
<div>some text</div>
[code]<p>some text</p>[/code]
<div>hfghgf</div>
<div>some text</div>

My filter looks like this

function sanitize($data) {
    $data = trim($data);
    $data = strip_tags($data);
    $data = htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
    return $data;
}

How can I now filter everything outside the [code] tags with my sanitize() function and then use only htmlspecialchars() on content inside the [code] tags. I also have to account for multiple [code] tags in one comment.


Solution

  • For your sample input, this seems a bit more direct:

    Code: (Demo)

    $data=<<<HTML
    <a>some text</a>
    <a>some text</a>
    [code]<p>some text</p>[/code]
    <div>some text</div>
    <div>some text</div>
    [code]<p>some text</p>[/code]
    <div>hfghgf</div>
    <div>some text</div>
    HTML;
    
    $data=strip_tags(                                                   // strip any residual tags from the string
            preg_replace_callback(
                '~\[code].*?\[/code]~is',                               // match [code]-wrapped substrings
                function($m){
                    return htmlspecialchars($m[0],ENT_QUOTES,'UTF-8');  // convert html entities as intended
                },
                $data
            )
        );
    
    var_export($data);
    

    Output:

    'some text
    some text
    [code]&lt;p&gt;some text&lt;/p&gt;[/code]
    some text
    some text
    [code]&lt;p&gt;some text&lt;/p&gt;[/code]
    hfghgf
    some text'