Search code examples
base64tinymceowasphtml-sanitizing

Sanitize Html with base64 image (and convert it to an image)


I have a WebApp with a TinyMCE Html Editor that allows users to input some html from a web page. Images can be pasted and are encoded as base64. Before saving the user input to DB I use OWASP java-html-sanitizer to discard potential dangerous code (javascript,...).

Some characters in the base64 string of the image are escaped and when I try to get the image back (using apache commons Base64) I'm not able to get a valid image.

Here my code for decoding the image:

byte[] b;
String s = html;
b = s.getBytes(Utility.UTF8);
b = org.apache.commons.codec.binary.Base64.decodeBase64(b);

For the HtmlSanitizer I have done nothing special, just followed the Ebay Policy Example allowing base64 images as suggested here.


Solution

  • Ah, as suggested here I need "to HTML decode before base64 decoding".

    I have tried with apache common StringEscapeUtils:

    org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(html);
    

    and it's working. Great.