Search code examples
pythonxmllxmlxml-namespaces

Registering namespaces with lxml before parsing


I am using lxml to parse XML from an external service that has namespaces, but doesn't register them with xmlns. I am trying to register it by hand with register_namespace, but that doesn't seem to work.

from lxml import etree

xml = """
    <Foo xsi:type="xsd:string">bar</Foo>
"""

etree.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
el = etree.fromstring(xml) # lxml.etree.XMLSyntaxError: Namespace prefix xsi for type on Foo is not defined

What am I missing? Oddly enough, looking at the lxml source code to try and understand what I might be doing wrong, it seems as if the xsi namespace should already be there as one of the default namespaces.


Solution

  • When an XML document is parsed and then saved again, lxml does not change any prefixes (and register_namespace has no effect).

    If your XML document does not declare its namespace prefixes, it is not namespace-well-formed. Using register_namespace before parsing cannot fix this.


    register_namespace defines the prefixes to be used when serializing a newly created XML document.

    Example 1 (without register_namespace):

    from lxml import etree
    
    el = etree.Element('{http://example.com}Foo')
    print(etree.tostring(el).decode())
    

    Output:

    <ns0:Foo xmlns:ns0="http://example.com"/>
    

    Example 2 (with register_namespace):

    from lxml import etree
    
    etree.register_namespace("abc", "http://example.com")
    
    el = etree.Element('{http://example.com}Foo')
    print(etree.tostring(el).decode())
    

    Output:

    <abc:Foo xmlns:abc="http://example.com"/>
    

    Example 3 (without register_namespace, but with a "well-known" namespace associated with a conventional prefix):

    from lxml import etree
    
    el = etree.Element('{http://www.w3.org/2001/XMLSchema-instance}Foo')
    print(etree.tostring(el).decode())
    

    Output:

    <xsi:Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>