Search code examples
pythonxmlmerge

merging xml files using python's ElementTree


I need to merge two xml files on the third block of the xml. So, files A.xml and B.xml look like this:

A.xml

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
    </result>
  </results>
</sample>

B.xml

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="Q">
      <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
    </result>
  </results>
</sample>

I need to merge on 'results'

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
   </result>
   <result type="Q">
      <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
   </result>
  </results>
</sample>

What I have done so far is this:

import os, os.path, sys
import glob
from xml.etree import ElementTree

def run(files):
    xml_files = glob.glob(files +"/*.xml")
    xml_element_tree = None
    for xml_file in xml_files:
        # get root
        data = ElementTree.parse(xml_file).getroot()
        # print ElementTree.tostring(data)
        for result in data.iter('result'):
            if xml_element_tree is None:
                xml_element_tree = data 
            else:
                xml_element_tree.extend(result) 
    if xml_element_tree is not None:
        print ElementTree.tostring(xml_element_tree)

As you can see, I assign the initial xml_element_tree to data which has the heading etc, and then extend with 'result'. However, this gives me this:

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
   </result>
  </results>
   <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
   </result>
</sample>

where the results need to be at the bottom. Any help will be appreciated.


Solution

  • Although this is mostly a duplicate and the answer can be found here, I already did this so I can share this Python code:

    import os, os.path, sys
    import glob
    from xml.etree import ElementTree
    
    def run(files):
        xml_files = glob.glob(files +"/*.xml")
        xml_element_tree = None
        for xml_file in xml_files:
            data = ElementTree.parse(xml_file).getroot()
            # print ElementTree.tostring(data)
            for result in data.iter('results'):
                if xml_element_tree is None:
                    xml_element_tree = data 
                    insertion_point = xml_element_tree.findall("./results")[0]
                else:
                    insertion_point.extend(result) 
        if xml_element_tree is not None:
            print ElementTree.tostring(xml_element_tree)
    

    However, this question contains another problem not present in the other post. The sample XML files are not valid XML, so it's not possible to have an XML tag with:

    <sample="1">
        ...
    </sample>
    

    Instead change to something like:

    <sample id="1">
        ...
    </sample>