Search code examples

XSS sanitizing nested html tags input

I'm using antisamy library to sanitize input to my application against XSS. I have problem with nested tags like a:


My sanitize method looks like:

    public String clean(String input) {
    if (input == null) {
        return null;
    input = StringEscapeUtils.unescapeHtml(input);
    try {
        Policy policy = Policy.getInstance(getClass().getResourceAsStream("/antisamy-textonly-policy.xml"));
        AntiSamy antiSamy = new AntiSamy();
        CleanResults cleanResults = antiSamy.scan(input, policy);
        String cleaned = cleanResults.getCleanHTML();
        return StringEscapeUtils.unescapeHtml(cleaned);
    } catch (PolicyException e) {
    } catch (ScanException e) {

My test against this type of input is failing:

    public void doubleTagTest() {
    def cleaned = xss.clean("<<b>script>alert('xss');<</b>/script>");
    assert cleaned.isEmpty();


Assertion failed: assert cleaned.isEmpty() | | | false alert('xss');

at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(

Do you have any idea how to do handle it without recursive call on xss.clean()?


  • Antisamy is producing the correct result - the badly formed tag(s) are removed leaving plain text alert('xss');.

    Consider the following

    <b<i>>Hello World!</b</i>>

    A bold and italic tag have somehow become muddled - antisamy correctly strips the broken tags leaving the text Hello World! which is correct. That there is a plain text that looks like javascript remaining in your original test is of no concern - the harmful <script> tag has been removed.