Updated bad terminology
I'm looking at JSoup and the OWASP Java HTML sanitizer project. I'm only interested in such a tool for the purposes of preventing XSS attacks by sanitizing user input passed to the API layer. The OWASP project says
"Passing 95+% of AntiSamy's unit tests plus many more."
But, it doesn't tell me where I can see these tests myself. What do these tests cover? More simply, I want to know why these said tools are defaulting to whitelist trust.
I'm sure there is a reason for their choosing whitelisting vs blacklisting. I want to disallow only known XSS unsafe tags like script
and attributes such as on*
. The blacklist approach does not even seem possible.
I need to know what the reasoning is for this and I suspect it's in the tests. For example, why disallow style
tags? Is it dangerous in terms of XSS or does it exist for some other reason? (style
can be XSS unsafe as mentioned in the comments: XSS attacks and style attributes)
I'm looking for more XSS unsafe justifications for other tags. The unit tests themselves should be enough if somebody knows where to find them. Given enough unsafe tags, this should tell me why a whitelist approach is necessary.
The original antisamy tests are in AntiSamyTest (antisamy).
They were adapted for owasp in AntiSamyTest (owasp).
They contain the tests against different html fragments, for example:
assertSanitizedDoesNotContain("<TABLE BACKGROUND=\"javascript:alert('XSS')\">", "background");
assertSanitizedDoesNotContain("<META HTTP-EQUIV=\"refresh\" CONTENT=\"0;url=data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K\">", "<meta");
See the XSS Evasion Cheat Sheet for some more examples.
We tried blacklists but we kept finding new tags or attributes to use to bypass the blacklist, or malformed html and other encodings were used by bypass filters, making blacklists impractical and ineffective. So now the default assumption is that if a tag, attribute, or style isn't explicitly specified as safe, then it's unsafe. This protects not just against the xss attacks we already know about, but many new tyes as well.