Search code examples
javaxmldomxml-parsing

An invalid XML character (Unicode: 0xc) was found


Parsing an XML file using the Java DOM parser results in:

[Fatal Error] os__flag_8c.xml:103:135: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

Solution

  • There are a few characters that are dissallowed in XML documents, even when you encapsulate data in CDATA-blocks.

    If you generated the document you will need to entity encode it or strip it out. If you have an errorneous document, you should strip away these characters before trying to parse it.

    See dolmens answer in this thread: Invalid Characters in XML

    Where he links to this article: http://www.w3.org/TR/xml/#charsets

    Basically, all characters below 0x20 is disallowed, except 0x9 (TAB), 0xA (CR?), 0xD (LF?)