Search code examples
javascriptjquerywysiwyghtmlpurifierjqte

Forbid script tags and event listeners in jqte jQuery text editor using htmlpurifier


I am using jqte to give users of a cms I wrote some WYSIWYG for their content. To output the content publicly I use htmlPurifier so there is no way, editors will do harm to the visitors of the site.

They could however place

<button onclick="alert('this sux')">klick me</button>

in the textarea and the next user will find a working button.

<script>evilcode</script>

is even executed.

Has anyone dealt with this before me and can give me a hint to an elegant solution here?


Solution

  • I'm going to go out on a limb here and say you don't have htmlspecialchars() around the output when you load previously submitted data into your form - you should, though, since it's still text for a textarea. The text is being interpreted as HTML by your WYSIWYG, but don't confuse that for actual HTML. :)

    As consolation, know that this confusion is extremely common (it keeps happening) and there are many, many people with a problem exactly like the one you describe.

    Let's take a look at the workflow and where things likely went wrong:

    Problem Workflow

    When someone writes <tag> into the richtext within your WYSIWYG field with the WYSIWYG loaded, the editor sees that someone wants to put the HTML &lt;tag&gt; into the message.

    When someone writes bold text into richtext, the editor sees that someone wants to put the HTML <b>bold text</b> (or comparable) into the message.

    Meanwhile, in the background, the text &lt;tag&gt; <b>bold text</b> (or whatever) is being stored in a textarea. To preserve the text as text in an HTML context, it's encoded with HTML-encoding, invisibly turning it into &amp;lt;tag&amp;gt; &lt;b&gt;bold text&lt;/b&gt;.

    However, when your submit button is pressed, the text of the textarea (&lt;tag&gt; <b>bold text</b>) is sent to your server, since the form data itself of course isn't HTML encoded (it's not embedded in HTML) - it's just a set of keys and values, and you wanted the value of the textarea.

    Now, when you're building HTML in your server-side application to load up the message again for further editing, you want the value of the field to be HTML encoded, since you're putting that value into an HTML context. What you were previously doing is creating <textarea>&lt;tag&gt; <b>bold text</b></textarea>, which is putting HTML into an HTML context. In basically all browsers, this makes the textarea take on the value <tag> <b>bold text</b>. Ouch! (Imagine if someone had </textarea> as part of their raw message!)

    To everyone's confusion, WYSIWYG editors are unfortunately good at nonetheless displaying approximately what you wanted, there. For most use-cases you won't even notice the difference, which is why this error is so widespread.

    When building the HTML of your page, though, you actually want to build <textarea>&amp;lt;tag&amp;gt; &lt;b&gt;bold text&lt;/b&gt;</textarea>. This makes the textarea take on the value &lt;tag&gt; <b>bold text</b> - that's exactly what you wanted.

    Your Current Solution, And Why It Breaks

    The solution you currently have runs the submitted text through htmlspecialchars_decode(), which turns &lt;tag&gt; into <tag>, thereby letting HTML Purifier eliminate it. You no longer need to worry about &lt;tag&gt; being interpreted as <tag> in the context of the WYSIWYG.

    However, you unfortunately have two problems:

    1) People can no longer submit messages about tags without HTML Purifier stripping them. Depending on the use case of your textarea, this may not be a problem. Maybe you don't want people to be able to submit HTML messages like If you're making your own website, you can use &lt;script src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.js" language="javascript"&gt; instead of hosting the jquery.js yourself - with your current solution, a message like that would be sanitised to If you're making your own website, you can use instead of hosting the jquery.js yourself by HTML Purifier.

    2) Much more dangerously, people can still hack you! Try writing the text &lt;script&gt;alert(1);&lt;/script&gt; into your editor (so the editor sees the HTML you want to submit as &amp;lt;script&amp;gt;alert(1);&amp;lt;/script&amp;gt;) and hitting submit. Your solution will turn this into &lt;script&gt;alert(1);&lt;/script&gt;, which you'll put into your <textarea> and then you're unfortunately back to square one.

    Actual Solution

    Remove your htmlspecialchars_decode() solution (but keep purifying!) and instead put htmlspecialchars() around your output. Your WYSIWYG will still work and you won't bypass HTML Purifier's sanitation any more.