Search code examples
jaxbutf-16moxyjava-11xerces

Java 11 UTF-16 BOM issues with com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl


I have a UTF-16 XML file:

<?xml version="1.0" encoding="utf-16" standalone="yes"?>

It starts with BOM FE FF.

Migrating my code to Java 11, I get:

Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:652) ~[?:?]

This is unmarshalling it using JAXB.

It happens whether I use the Reference Implementation:

Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:652) ~[?:?]
    at com.sun.xml.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:134) ~[jaxb-runtime-2.4.0-SNAPSHOT.jar:?]
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:385) ~[jaxb-runtime-2.4.0-SNAPSHOT.jar:?]
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:356) ~[jaxb-runtime-2.4.0-SNAPSHOT.jar:?]

or MOXy:

Message: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:652) ~[?:?]
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:98) ~[org.eclipse.persistence.core-2.5.2.jar:?]
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:86) ~[org.eclipse.persistence.core-2.5.2.jar:?]
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.unmarshal(SAXUnmarshaller.java:895) ~[org.eclipse.persistence.core-2.5.2.jar:?]
    at org.eclipse.persistence.oxm.XMLUnmarshaller.unmarshal(XMLUnmarshaller.java:659) ~[org.eclipse.persistence.core-2.5.2.jar:?]
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.unmarshal(JAXBUnmarshaller.java:585) ~[org.eclipse.persistence.moxy-2.5.2.jar:?]

They both use com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl

Unmarshalling that file worked fine using Java 6 to 8. Did something change in Java 9 or 11?

If I remove the FE FF BOM, it unmarshals OK with Java 11.


Solution

  • Turns out my problem was caused by maven-resources-plugin, with filtering set to true. That was mangling any UTF-16 resource, changing the first 2 bytes to EF BF.