Search code examples
pythonxmlxml-parsingelementtreexml.etree

Why my xml chancing when i use write method in xml.etree Python Library?


I am trying to edit a xml file. I am using the xml.etree library.

My xml

<ext:UBLExtensions>
    <ext:UBLExtension>
        <ext:ExtensionContent>
        </ext:ExtensionContent>
    </ext:UBLExtension>
</ext:UBLExtensions>

my python code

import xml.etree.ElementTree as gfg

tree = gfg.parse('file_name.xml')
root = tree.getroot()
tree.write("file_name.xml")

i haven't change anything but my xml become this.

<ns1:UBLExtensions>
    <ns1:UBLExtension>
        <ns1:ExtensionContent>
        </ns1:ExtensionContent>
    </ns1:UBLExtension>
</ns1:UBLExtensions>

why my header is change ? How can i avoid this ?


Solution

  • The two documents you've posted are identical, as long as the namespace prefix maps to the same namespace. When you have something like this:

    <document xmlns:doc="http://example.com/document/v1.0">
      <doc:title>An example</title>
    </document>
    

    Then that <doc:title> element means <title> in the http://example.com/document/v1.0` namespace". When you parse the document, your XML parser doesn't particularly care about the prefix, and it will generate a new prefix when writing out the document...

    ...unless you configure an explicit prefix mapping, which we can do with the register_namespace method. For example:

    import xml.etree.ElementTree as etree
    
    etree.register_namespace("ext", "http://example.com/extensions")
    
    tree = etree.parse("data.xml")
    tree.write("out.xml")
    

    If data.xml contains:

    <example xmlns:ext="http://example.com/extensions">
      <ext:UBLExtensions>
        <ext:UBLExtension>
          <ext:ExtensionContent>
          </ext:ExtensionContent>
        </ext:UBLExtension>
      </ext:UBLExtensions>
    </example>
    

    Then the above code will output:

    <example xmlns:ext="http://example.com/extensions">
      <ext:UBLExtensions>
        <ext:UBLExtension>
          <ext:ExtensionContent>
          </ext:ExtensionContent>
        </ext:UBLExtension>
      </ext:UBLExtensions>
    </example>
    

    Without the call to etree.register_namespace; the output looks like:

    <example xmlns:ns0="http://example.com/extensions">
      <ns0:UBLExtensions>
        <ns0:UBLExtension>
          <ns0:ExtensionContent>
          </ns0:ExtensionContent>
        </ns0:UBLExtension>
      </ns0:UBLExtensions>
    </example>
    

    It's the same document, and the elements are all still in the same namespace; we're just using a different prefix as the short name of the namespace.