I'm integrating Redactor (a WYSIWYG editor) on my website and it outputs HTML instead of BBCode or Markdown. I need to allow the following tags as it uses them for formatting:
<code><span><div><label><a><br><p><b><i><del><strike><u><img><video><audio><iframe><object><embed><param><blockquote><mark><cite><small><ul><ol><li><hr><dl><dt><dd><sup><sub><big><pre><code><figure><figcaption><strong><em><table><tr><td><th><tbody><thead><tfoot><h1><h2><h3><h4><h5><h6>
From what I've read and been told on here, in order to safely display the content I should store the original data in my database, along with a sanitized version (output by HTML Purifier) which is what I will actually output (the unsanitized version being there in case anything goes wrong when sanitizing it).
My question is, should I call strip_tags()
on the data as well (passing the above tags as the allowed tags argument), or should I pass it directly to HTML Purifier?
While it's true that you can likely reduce the parsing work that a parser like HTML Purifier does by filtering out tags before the fact, there's no security gain in using strip_tags()
first, and in your use-case it likely isn't going to make much of a difference.
The reason it won't make much of a difference is, of course, that your average submitted content will not be malicious, and thus be submitted via your WYSIWYG, which is only going to generate those tags that you already want to allow. As such, you wouldn't strip out any tags in the preliminary strip_tags()
run for those comments.
Meanwhile, a malicious submission is likely to bypass any benefit strip_tags()
would give you, anyway. However, using strip_tags()
before the parser won't do harm, and it could help guard against attempts to use the parser against you by letting it eat up a lot of resources - though if the parser can cause issues (I'd expect it to have safeguards against that), that tends to happen through nesting depth, not through tag.
In brief:
I see no reason to recommend it in your case; but I see no reason to dissuade you from using it, either. strip_tags()
is pretty fast and it won't mangle anything if you use it before the parser.