Search code examples
phphtmlsecurityowaspesapi

OWASP ESAPI encodeForHTML with some allowed formatting tags


In a web project, we use OWASP ESAPI in PHP for output encoding. At some points, we'd like to allow a subset of HTML for little formatting options (for example, <i> and <b>), while disallowing all other tags and special characters (so they are entity-encoded using the &...; syntax).

I see the following possibilities to achieve this:

  1. Tell the OWASP ESAPI encoder to white-list / allow these tags so that it only encodes all other HTML tags and entities properly. But this doesn't seem to be supported. We could however (maybe) write a patch which allows this.
  2. Decode the white-listed tags after encoding with ESAPI. Can this be attacked?
  3. Use some other output encoding technique for this use case. Are there other libraries?

In particular, I need the following tags and attributes to be white-listed:

  • <br>
  • <i>
  • <b>
  • <u>
  • <big>
  • <small>
  • <sub>
  • <sup>
  • <font color="...">
  • <ul> + <li>
  • <ol> + <li>

Please note that our application is security critical. This means that any method we are going to implement should only accept the tags above (and maybe some more formatting-only tags), everything else has to be entity-encoded properly. That this is true should be easily verifiable without doubt by looking at the (simple) code / explanation of the code. The shorter the code, the easier the reviews are. Fully hand-crafted encoders aren't good for this.


Solution

  • It sounds like what you are actually looking for is HTMLPurifier

    http://htmlpurifier.org/

    FWIW I am not affiliated with HTMLPurifier at all, and I am the Project Leader of the OWASP ESAPI project.