I have a WebApp with a TinyMCE Html Editor that allows users to input some html from a web page. Images can be pasted and are encoded as base64. Before saving the user input to DB I use OWASP java-html-sanitizer to discard potential dangerous code (javascript,...).
Some characters in the base64 string of the image are escaped and when I try to get the image back (using apache commons Base64) I'm not able to get a valid image.
Here my code for decoding the image:
byte[] b;
String s = html;
b = s.getBytes(Utility.UTF8);
b = org.apache.commons.codec.binary.Base64.decodeBase64(b);
For the HtmlSanitizer I have done nothing special, just followed the Ebay Policy Example allowing base64 images as suggested here.
Ah, as suggested here I need "to HTML decode before base64 decoding".
I have tried with apache common StringEscapeUtils:
org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(html);
and it's working. Great.