Search code examples
javaxmljaxbsax

SaxParseException The element type "hr" must be terminated by the matching end-tag "</hr>". while reading xml with jaxb


I am trying to read the following xml from the giving link with jaxb. I keep getting the following exception. There is no hr tag in the document.

here is my code:

            final JAXBContextjaxbContext=JAXBContext.newInstance(EuropeanParliamentMemberResponse.class);

            final Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();

            final JAXBElement<EuropeanParliamentMemberResponse> response = jaxbUnmarshaller.unmarshal(new StreamSource(url), EuropeanParliamentMemberResponse.class);

Here is the Exception:

org.xml.sax.SAXParseException; systemId: http://www.europarl.europa.eu/meps/en/full-list/xml; lineNumber: 6; columnNumber: 3; The element type "hr" must be terminated by the matching end-tag "</hr>".]

What am I doing wrong?


Solution

  • The reason you are getting that error is because you are using the wrong protocol in your URL. Use https instead of http.

    When you use http, the server generates a "301 - moved permanently" response:

    <html>
        <head><title>301 Moved Permanently</title></head>
        <body>
            <center>
                <h1>301 Moved Permanently</h1>
            </center>
            <hr>
            <center>nginx</center>
        </body>
    </html>
    

    You can see the <hr> tag causing the error (it is not valid for the expected content type of XML).

    Your browser will handle this correctly, if you use the http URL - but your JAXB unmarshaller will not.

    Assuming you have all the correct JAXB annotations on your class, the code in your question should work (it works for me) with the updated URL:

    https://www.europarl.europa.eu/meps/en/full-list/xml
    

    A couple of suggestions for troubleshooting this type of issue:

    1. Go to the home page in a browser: http://www.europarl.europa.eu - and you will see that you are redirected to a https URL.

    2. You can extract the redirect response I showed above by using Java's HttpClient (available from Java 11 onwards):

    String url = "http://www.europarl.europa.eu/meps/en/full-list/xml";
    HttpClient client = HttpClient.newHttpClient();
    HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .build();
    client.sendAsync(request, BodyHandlers.ofString())
            .thenApply(HttpResponse::body)
            .thenAccept(System.out::println)
            .join();
    

    This prints the response body, where you can see the redirect message.