Search code examples
pythonxmlelementtreefile-writingfindall

Python write result of XML ElementTree findall to a file


I want to write a python code to extract some data from a source XML file and write to a new file. My source file is like this:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">

    <soapenv:Header/>
    <soapenv:Body>
        <SessionID xmlns="http://www.niku.com/xog">12345</SessionID>
        <QueryResult xmlns="http://www.niku.com/xog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <Records>
                <Record>
                  <id>1</id>
                  <date_start>2020-10-04T00:00:00</date_start>
                  <date_end>2020-10-10T00:00:00</date_end>
                  <name>Payne, Max</name>
                </Record>
                <Record>
                  <id>2</id>
                  <date_start>2020-10-04T00:00:00</date_start>
                  <date_end>2020-10-10T00:00:00</date_end>
                  <name>Reno, Jean</name>
                </Record>
            </Records>
        </QueryResult>
    </soapenv:Body>
</soapenv:Envelope>

I want to write the following output to a new xml file.

<Records>
    <Record>
      <id>1</id>
      <date_start>2020-10-04T00:00:00</date_start>
      <date_end>2020-10-10T00:00:00</date_end>
      <name>Payne, Max</name>
    </Record>
    <Record>
      <id>2</id>
      <date_start>2020-10-04T00:00:00</date_start>
      <date_end>2020-10-10T00:00:00</date_end>
      <name>Reno, Jean</name>
    </Record>
</Records>

I was able to get following results from this code.

import xml.etree.ElementTree as ET

tree = ET.parse('my_file.xml')

root = tree.getroot()

for xtag in root.findall('.//{http://www.niku.com/xog/Query}Record'):
    print(xtag)

Result:

<Element '{http://www.niku.com/xog/Query}Record' at 0x00000216BA69B778>
<Element '{http://www.niku.com/xog/Query}Record' at 0x00000216BA6A3228>

Can anyone help me to complete my requirement?


Solution

  • In your case print(xtag) prints the xtag object and not a string. For that you would need to convert the object to a string using the tree's tostring() method. Also, it seems you are looking to get the whole <Records> block instead of the individual <Record> elements; for this you don't need a loop.

    import xml.etree.ElementTree as ET
    
    tree = ET.parse('test.xml')
    root = tree.getroot()
    
    records = root.find('.//{http://www.niku.com/xog/Query}Records')
    print(ET.tostring(records).decode("utf-8"))
    

    Output

    <ns0:Records xmlns:ns0="http://www.niku.com/xog/Query">
                    <ns0:Record>
                      <ns0:id>1</ns0:id>
                      <ns0:date_start>2020-10-04T00:00:00</ns0:date_start>
                      <ns0:date_end>2020-10-10T00:00:00</ns0:date_end>
                      <ns0:name>Payne, Max</ns0:name>
                    </ns0:Record>
                    <ns0:Record>
                      <ns0:id>2</ns0:id>
                      <ns0:date_start>2020-10-04T00:00:00</ns0:date_start>
                      <ns0:date_end>2020-10-10T00:00:00</ns0:date_end>
                      <ns0:name>Reno, Jean</ns0:name>
                    </ns0:Record>
                </ns0:Records>
    

    You could also use the lxml module, which gives a slightly different output.

    from lxml import etree
    
    tree = etree.parse('test.xml')
    root = tree.getroot()
    
    records = root.find('.//{http://www.niku.com/xog/Query}Records')
    print(etree.tostring(records).decode("utf-8"))
    

    Output

    <Records xmlns="http://www.niku.com/xog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
                    <Record>
                      <id>1</id>
                      <date_start>2020-10-04T00:00:00</date_start>
                      <date_end>2020-10-10T00:00:00</date_end>
                      <name>Payne, Max</name>
                    </Record>
                    <Record>
                      <id>2</id>
                      <date_start>2020-10-04T00:00:00</date_start>
                      <date_end>2020-10-10T00:00:00</date_end>
                      <name>Reno, Jean</name>
                    </Record>
                </Records>