Search code examples
phpsecurityhtmlpurifier

advantages from htmlpurifier instead of regex filtering


We have recently implemented htmlpurifier in our web-based application. Earlier we used to have regexes to match commonly known XSS injections (script, img, etc. etc). We realized that this wasn't good enough and hence moved to htmlpurifier. Now given that htmlpurifier is slow in working (very slow compared to the regex method we had earlier), is it really worth to have htmlpurifier? Or does it make any sense to keep increasing the regex filtering until we reach a satisfactory level (it might be argued that the speed benefits would be nullified by that time). Anyone else who has faced similar issues with security for their web application and what did you do in the end?

Please let know if anything seems vague; I would be happy to provide more details.


Solution

  • Using a regex for html/javascript? Perhaps you have not seen this epic answer by Mr Bobice. In short if you use a regex then you have two problems. In fact the reason why HTML Purifier is so slow is because it uses hundreds of calls to preg_match() and preg_repalce() in order to clean a message. You must never re-invent the wheal, without a doubt be less secure.

    The real question is htmlspeicalchars($var,ENT_QUOTES); vs HTML Purifier. HTML Purifer is not only slow, it has been hacked, many times. Don't use HTML Purifier unless there is no other choice, htmlspeicalchars solves most problems and it solves it in a way that cannot be bypassed.