Search code examples
javaxmlparsinglibxml2

Is there an online LIBXML2 XML parser available or a way to parse XML with libxml2 standalone?


We are currently on a module trying to parse XML using LIBXML2 component and have found an issue related to it when a XML containing a namespace containing non-ASCII character such as this é.

Sample XML file:

< ?xml version="1.0" encoding="UTF-8"?>
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP:Body>
    <Helloé xmlns="http://schemas/Helloé">
      <ns0:Helloé xmlns:ns0="http://schemas/Helloé" />
    </Helloé>
  </SOAP:Body>
</SOAP:Envelope>

We were able to check and confirm that this is supported by the DOM parser by testing it with a small test program. When we have tried to check for the validity of this scenario provided, by the W3School XML online parser we are getting the following error:

enter image description here

We have tested it through the other online sources too, as like this even which says the same - the same error message.

Can anyone please let us know if there is a way to identify an online tool/resource where we can pinpoint this to libxml2?

Or a sample program that can test this?


Solution

  • Simply run the file through libxml2's xmllint on the command line:

    $ xmllint --noout so.xml
    so.xml:4: namespace error : xmlns: 'http://schemas/Helloé' is not a valid URI
        <Helloé xmlns="http://schemas/Helloé">
                                               ^
    so.xml:5: namespace error : xmlns:ns0: 'http://schemas/Helloé' is not a valid URI
          <ns0:Helloé xmlns:ns0="http://schemas/Helloé" />
                                                         ^
    

    Also, replacing é with the correct UTF-8 percent-escape works. Just change the URI to http://schemas/Hello%C3%A9.