Search code examples
phphtmlxsshtmlpurifier

HTML Purifier - iframe and scripts


I'm using HTML Purifier in my project.

My html is something like this. (containing simple html element + script + iframe)

<p>content...<p>
<iframe></iframe>
<script>alert('abc');</script>
<p>content2</p>

With default config, it turned into this

<p>content...</p>
<p></p>
<p>Content2</p>

But if I set the config like this...

$config->set('HTML.Trusted', true);
$config->set('HTML.SafeIframe', true);

I got this

<p>content...</p>
<p>
    <iframe></iframe>
    <script type="text/javascript"><!--//--><![CDATA[//><!--
    alert('abc');
    //--><!]]></script>
</p>
<p>content2</p>

Is there anyway to use HTML Purifier to completely remove 'script' tag but preserve 'iframe' tag? Or other alternative to HTML Purifier?

I've tried

$config->set('Filter.YouTube', true);
$config->set('URI.SafeIframeRegexp', '%^https://(www.youtube.com/embed/|player.vimeo.com/video/)%');

But it turned out that the 'script' tag still there.

[edited]

full example.

$config = HTMLPurifier_Config::createDefault();

$html = "<p>content...<p><iframe ...></iframe><script>alert('abc');</script><p>content2</p>";

$config->set(
        'HTML.ForbiddenElements',
        'script'
    );

$purifier = new HTMLPurifier($config);

$clean_html = $purifier->purify($html);

Result

<p>content...</p><p></p><p>content2</p>

Solution

  • You were half on the right track. If you set HTML.SafeIframe to true and URI.SafeIframeRegexp to the URLs you want to accept (%^https://(www.youtube.com/embed/|player.vimeo.com/video/)% works fine), an input example of:

    <p>content...<p>
    <iframe src="https://www.youtube.com/embed/blep"></iframe>
    <script>alert('abc');</script>
    <p>content2</p>
    

    ...turns into...

    <p>content...</p><p>
    <iframe src="https://www.youtube.com/embed/blep"></iframe>
    
    </p><p>content2</p>
    

    Explanation: HTML.SafeIframe allows the <iframe> tag, but HTML Purifier still expects a whitelist for the URLs that the iframe can contain, since otherwise an <iframe> opens too much malicious potential. URI.SafeIframeRegexp supplies the whitelist (in the form of a regex that needs to be matched).

    See if that works for you!

    Code

    This is the code that made the transformation I just mentioned:

    $dirty = '<p>content...<p>
    <iframe src="https://www.youtube.com/embed/blep"></iframe>
    <script>alert(\'abc\');</script>
    <p>content2</p>';
    
    $config = HTMLPurifier_Config::createDefault();
    $config->set('HTML.SafeIframe', true);
    $config->set('URI.SafeIframeRegexp', '%^https://(www.youtube.com/embed/|player.vimeo.com/video/)%');
    
    $purifier = new HTMLPurifier($config);
    
    $clean = $purifier->purify($dirty);
    

    Regarding HTML.Trusted

    I implore you to never set HTML.Trusted to true if you don't fully trust each and every one of the people submitting the HTML.

    Amongst other things, it allows forms in your input HTML to survive the purification unmolested, which (if you're purifying for a website, which I assume you are) makes phishing attacks trivial. It allows your input to use style tags which survive unscathed. There are some things it will still strip (any HTML tag that HTML Purifier doesn't actually know anything about, i.e. most HTML5 tags being some of them, various JavaScript attribute handlers as well), but there are enough attack vectors that you might as well not be purifying if you use this directive. As Ambush Commander once put it:

    You shouldn't be using %HTML.Trusted anyway; it really ought to be named %HTML.Unsafe or something.