Search code examples
pythonxmlappendelementtreepretty-print

Appending an xml-node which is read from file breaks pretty_print for adjacent nodes


I'm generating a XML-file with python's etree library. One node in the generated file is read from an existing XML-file. Adding this element breaks the pretty_print for the nodes directly before and after.

import xml.etree.cElementTree as ET
from lxml import etree

root = etree.Element("startNode")
subnode1 = etree.SubElement(root, "SubNode1")
subnode1Child1 = etree.SubElement(subnode1, "subNode1Child1")
etree.SubElement(subnode1Child1, "Child1")
etree.SubElement(subnode1Child1, "Child2")

f = open('/xml_testdata/ext_file.xml','r')
ext_xml = etree.fromstring(f.read())
ext_subnode = ext_xml.find("ExtNode")
subnode1.append(ext_subnode)

subnode1Child2 = etree.SubElement(subnode1, "subNode1Child2")
etree.SubElement(subnode1Child2, "Child1")
etree.SubElement(subnode1Child2, "Child2")

tree = etree.ElementTree(root)
tree.write("testfile.xml", xml_declaration=True, pretty_print=True)

which gives this result:

<startNode>
    <SubNode1><subNode1Child1><Child1/><Child2/></subNode1Child1><ExtNode>
            <NodeFromExt>
                <SubNodeFromExt1/>
            </NodeFromExt>
            <NodeFromExt>
                <SubNodeFromExt2/>
                <AnotherSubNodeFromExt2>
                    <SubSubNode/>
                    <AllPrettyHere>
                        <Child/>
                    </AllPrettyHere>
                </AnotherSubNodeFromExt2>
            </NodeFromExt>
    </ExtNode>
    <subNode1Child2><Child1/><Child2/></subNode1Child2></SubNode1>
</startNode>

Not very readable, is it? Even worse when "subNodeChild" contains a lot more subnodes than this example!

Without appending the external elements, it looks like this:

<startNode>
  <SubNode1>
    <subNode1Child1>
      <Child1/>
      <Child2/>
    </subNode1Child1>
    <subNode1Child2>
      <Child1/>
      <Child2/>
    </subNode1Child2>
  </SubNode1>
</startNode>

So the problem is caused by appending the external elements!

Is there a way to append the external elements without breaking the pretty_print-output?


Solution

  • You can get nicer pretty-printed output by using a parser object that removes ignorable whitespace when parsing the existing XML file.

    Instead of this:

    f = open('/xml_testdata/ext_file.xml','r')
    ext_xml = etree.fromstring(f.read())
    

    Use this:

    f = open('/xml_testdata/ext_file.xml', 'r')
    parser = etree.XMLParser(remove_blank_text=True)
    ext_xml = etree.fromstring(f.read(), parser)
    

    See also: