Search code examples
javaxmljaxbjaxb2

Java unmarshal xml with illegal XML characters


I'm trying to unmarshal with javax.xml.bind.Unmarshaller the XML string but received the following error:

Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13) was found in the element content of the document.

Are there any universal solutions to remove all illegal XML characters from the input string?

For example, I tried to use the following one but it doesn't help:

public static String illegalXML11CharactersPattern = "[^"
        + "\u0001-\uD7FF"
        + "\uE000-\uFFFD"
        + "\ud800\udc00-\udbff\udfff"
        + "]+";

public static String stripNonValidXML11Characters(String xml) {
    return xml.replaceAll(illegalXML11CharactersPattern, "");
}

Solution

  • Finally, I finished with the following approach:

    xml = org.apache.commons.lang3.StringEscapeUtils.unescapeXml(StringEscapeUtils.escapeXml10(xml));