Search code examples
javaowasp

OWASP java-html-sanitizer - policy for unclosed tags


I am using the same example of EbayPolicyExample.java from the OWASP java-html-sanitizer, however, I have been facing some issues when the user writes something like: let's consider that x <n then we have ... So what happens is that every text after <n is removed.

How could I fix this issue and remains the string <n?

I tried to change the Policy a little bit, but I wasn't successful and it is possible to guess what letter the user will enter after <.

Note: the tag n is just an example, it should also work for any letter (e.g. <o, <i, <y)?

  • input: let's consider that x <n then we have ...
  • actual output: let's consider that x
  • expect output: let's consider that x <n then we have ...

The link for the code: https://raw.githubusercontent.com/OWASP/java-html-sanitizer/master/src/main/java/org/owasp/html/examples/EbayPolicyExample.java


Solution

  • You are supposed to provide valid HTML in your input. < is a special character in HTML. It denotes the start of a tag.

    You have to replace < with the corresponding HTML entity: &lt; in your input.

    I the input is entered by the user, you have to pre-process it and make sure it actually is valid HTML. This post might be useful: Recommended method for escaping HTML in Java