We are currently on a module trying to parse XML using LIBXML2 component and have found an issue related to it when a XML containing a namespace containing non-ASCII character such as this é
.
Sample XML file:
< ?xml version="1.0" encoding="UTF-8"?>
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP:Body>
<Helloé xmlns="http://schemas/Helloé">
<ns0:Helloé xmlns:ns0="http://schemas/Helloé" />
</Helloé>
</SOAP:Body>
</SOAP:Envelope>
We were able to check and confirm that this is supported by the DOM parser by testing it with a small test program. When we have tried to check for the validity of this scenario provided, by the W3School XML online parser we are getting the following error:
We have tested it through the other online sources too, as like this even which says the same - the same error message.
Can anyone please let us know if there is a way to identify an online tool/resource where we can pinpoint this to libxml2
?
Or a sample program that can test this?
Simply run the file through libxml2's xmllint
on the command line:
$ xmllint --noout so.xml
so.xml:4: namespace error : xmlns: 'http://schemas/Helloé' is not a valid URI
<Helloé xmlns="http://schemas/Helloé">
^
so.xml:5: namespace error : xmlns:ns0: 'http://schemas/Helloé' is not a valid URI
<ns0:Helloé xmlns:ns0="http://schemas/Helloé" />
^
Also, replacing é
with the correct UTF-8 percent-escape works. Just change the URI to http://schemas/Hello%C3%A9
.