Search code examples
pythonxmlelementtreexml-namespaces

How to make ElementTree parse XML without changing namespace declarations?


I have next xml:

data3.xml:

<?xml version="1.0" encoding="UTF-8"?>
<feed
    xmlns="http://www.w3.org/2005/Atom">
    <entry>
        <content>
            <ns2:executionresult
                xmlns:ns2="http://jazz.net/xmlns/alm/qm/v0.1/"
                xmlns:ns4="http://purl.org/dc/elements/1.1/">
                <ns4:title>xxx</ns4:title>
            </ns2:executionresult>
        </content>
    </entry>
</feed>

I have next code:

test3.py:

import xml.etree.ElementTree as ET
tree = ET.parse('data3.xml')
root = tree.getroot()
xml_str = ET.tostring(root).decode()
print(xml_str)

Output:

$ python3 test3.py
<ns0:feed xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://jazz.net/xmlns/alm/qm/v0.1/">
    <ns0:entry>
        <ns0:content>
            <ns1:executionresult>
                <dc:title>xxx</dc:title>
            </ns1:executionresult>
        </ns0:content>
    </ns0:entry>
</ns0:feed>

I wonder why the ElementTree automatically change the namespace for me? What's the rule? And how can I avoid it?


Solution

  • Pls use lxml library instead of xml.etree

    Code:

    from lxml import etree
    
    tree = etree.parse('data3.xml')
    root = tree.getroot()
    xml_str = etree.tostring(root, pretty_print=True, encoding='utf-8').decode()
    print(xml_str)
    

    Output:

    <feed xmlns="http://www.w3.org/2005/Atom">
        <entry>
            <content>
                <ns2:executionresult xmlns:ns2="http://jazz.net/xmlns/alm/qm/v0.1/" xmlns:ns4="http://purl.org/dc/elements/1.1/">
                    <ns4:title>xxx</ns4:title>
                </ns2:executionresult>
            </content>
        </entry>
    </feed>