I am using lxml to parse XML from an external service that has namespaces, but doesn't register them with xmlns
. I am trying to register it by hand with register_namespace
, but that doesn't seem to work.
from lxml import etree
xml = """
<Foo xsi:type="xsd:string">bar</Foo>
"""
etree.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
el = etree.fromstring(xml) # lxml.etree.XMLSyntaxError: Namespace prefix xsi for type on Foo is not defined
What am I missing? Oddly enough, looking at the lxml source code to try and understand what I might be doing wrong, it seems as if the xsi
namespace should already be there as one of the default namespaces.
When an XML document is parsed and then saved again, lxml does not change any prefixes (and register_namespace
has no effect).
If your XML document does not declare its namespace prefixes, it is not namespace-well-formed. Using register_namespace
before parsing cannot fix this.
register_namespace
defines the prefixes to be used when serializing a newly created XML document.
register_namespace
):from lxml import etree
el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())
Output:
<ns0:Foo xmlns:ns0="http://example.com"/>
register_namespace
):from lxml import etree
etree.register_namespace("abc", "http://example.com")
el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())
Output:
<abc:Foo xmlns:abc="http://example.com"/>
register_namespace
, but with a "well-known" namespace associated with a conventional prefix):from lxml import etree
el = etree.Element('{http://www.w3.org/2001/XMLSchema-instance}Foo')
print(etree.tostring(el).decode())
Output:
<xsi:Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>