Search code examples
javasax

org.xml.sax.SAXParseException: Content is not allowed in prolog


Yes, I know that general forms of this question have been asked time and time again. However, I couldn't find anything that helped me solve my problem, so am posting this question which is specifically about my problem.

I am trying to figure out why I am getting a SAXParseException (Content is not allowed in prolog.) as the OpenSAML library is trying to parse some XML. The most useful hints I found pointed toward an errant BOM at the beginning of the file, but there's nothing like that. I also wrote a quick-and-dirty C#.NET routine to read the whole file as an array of bytes, iterate over it and tell me if any of them were >=0x80 (it found none). The XML is marked as utf-8. I am hoping that someone can provide me with a bit of insight as to what might be going wrong.

The initial portion of the XML file, as a hex dump, is (note the use of 0A as a newline; removing the line feed character entirely has no apparent effect):

000000000  3C 3F 78 6D 6C 20 76 65-72 73 69 6F 6E 3D 22 31   |<?xml version="1|
000000010  2E 30 22 20 65 6E 63 6F-64 69 6E 67 3D 22 55 54   |.0" encoding="UT|
000000020  46 2D 38 22 3F 3E 0A 3C-6D 64 3A 45 6E 74 69 74   |F-8"?>.<md:Entit|
000000030  79 44 65 73 63 72 69 70-74 6F 72 20 78 6D 6C 6E   |yDescriptor xmln|
000000040  73 3A 6D 64 3D 22 75 72-6E 3A 6F 61 73 69 73 3A   |s:md="urn:oasis:|
000000050  6E 61 6D 65 73 3A 74 63-3A 53 41 4D 4C 3A 32 2E   |names:tc:SAML:2.|
000000060  30 3A 6D 65 74 61 64 61-74 61 22 20               |0:metadata"     |

The stack trace for the root cause exception is:

org.xml.sax.SAXParseException: Content is not allowed in prolog.
    org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
    org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    org.opensaml.xml.parse.BasicParserPool$DocumentBuilderProxy.parse(BasicParserPool.java:665)
    my.Unmarshaller.unmarshall(Unmarshaller.java:39)
    ... internal calls omitted for brevity ...
    javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:722)

The code that tries to do the unmarshalling is (type names fully qualified here; hopefully I am not leaving out something important):

package my;

public class Unmarshaller {

    protected static org.opensaml.xml.parse.ParserPool parserPool;

    static {
        org.opensaml.xml.parse.BasicParserPool _parserPool;
        _parserPool = new org.opensaml.xml.parse.BasicParserPool();
        _parserPool.setNamespaceAware(true);
        Unmarshaller.parserPool = _parserPool;
    }

    public Unmarshaller() {
        try {
            org.opensaml.DefaultBootstrap.bootstrap();
        } catch (org.opensaml.xml.ConfigurationException e) {
            throw new java.lang.RuntimeException (e);
        }
    }

    public Object unmarshall(String xml)
    throws org.opensaml.xml.io.UnmarshallingException {
        assert xml != null;
        assert !xml.isEmpty();
        assert Unmarshaller.parserPool != null;

        org.w3c.dom.Document doc;

        try {
            doc =
                (parserPool.getBuilder())
                    .parse( // <<<====== line 39 in original source code is here
                        new org.xml.sax.InputSource(
                            new java.io.StringReader(xml)
                        )
                    );
        } catch (org.xml.sax.SAXException e) {
            throw new org.opensaml.xml.io.UnmarshallingException(e);
        } catch (java.io.IOException e) {
            throw new org.opensaml.xml.io.UnmarshallingException(e);
        } catch (org.opensaml.xml.parse.XMLParserException e) {
            throw new org.opensaml.xml.io.UnmarshallingException(e);
        }

        // ... remainder of function omitted for brevity ...
    }
}

Solution

  • I can't see anything wrong with the XML fragment in the file dump. And I believe you when you say that the XML file validates.

    However, you have not presented water-tight evidence that the XML that the parser sees is valid. For instance:

    • You might be trying to parse a different file to the one that you have dumped. (These things have been known to happen ...).

    • Alternatively, there might be something wrong with the way that you are getting the XML into that String that you then parse.

    Try dumping the first few lines of the String that provides the parser source stream.