Search code examples
pythonxmlpython-2.7arcpy

Python extract data from xml and save it to excel


I would like to extract some data from an XML file and save it in a table format, such as XLS or DBF.

Here is XML file i have:

<?xml version="1.0" encoding="utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP-ENV:Header />
  <SOAP-ENV:Body>
    <ADD_LandIndex_001>
      <CNTROLAREA>
        <BSR>
          <VERB>ADD</VERB>
          <NOUN>LandIndex</NOUN>
          <REVISION>001</REVISION>
        </BSR>
      </CNTROLAREA>
      <DATAAREA>
        <LandIndex>
          <reportId>AMI100031</reportId>
          <requestKey>R3278458</requestKey>
          <SubmittedBy>EN4871</SubmittedBy>
          <submittedOn>2015/01/06 4:20:11 PM</submittedOn>
          <LandIndex>
            <agreementdetail>
              <agreementid>001       4860</agreementid>
              <agreementtype>NATURAL GAS</agreementtype>
              <currentstatus>
                <status>ACTIVE</status>
                <statuseffectivedate>1965/02/18</statuseffectivedate>
                <termdate>1965/02/18</termdate>
              </currentstatus>
              <designatedrepresentative>
              </designatedrepresentative>
            </agreementdetail>
          </LandIndex>
        </LandIndex>
      </DATAAREA>
    </ADD_LandIndex_001>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

I am interested in information inside the agreementdetail tag which is under DATAAREA/LandIndex/LandIndex/

UPDATE:

Thanks to MattDMo this task has moved a bit from its dead point. So I made this script below. It iterates the file and gets all instances of the agreementdetail and outputs agreementid and agreementtype for each.

import xml.etree.ElementTree as ET
import arcpy

xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.findall(".//agreementdetail")
result = []
elements = ('agreementid', 'agreementtype')

for a in agreement:
    obj = {}
    for e in elements:
        obj[e] = a.find(e).text
    result.append(obj)

arcpy.AddMessage(result)

The output I am receiving consists of a bunch of this strings: {'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}

Now I need to convert this output into a table format (.csv, .dbf, .xls etc.) so that agreementid and agreementtype are columns:

agreementid    | agreementtype 
001       4860 | NATURAL GAS

I will be very grateful if you could guide me on how to accomplish it. Or maybe any example?

P.S. Python version is 2.7


Solution

  • The following should work:

    import xml.etree.ElementTree as ET
    import arcpy
    
    xmlfile = 'D:/Working/Test/Test.xml'
    element_tree = ET.parse(xmlfile)
    root = element_tree.getroot()
    agreement = root.find(".//agreementid").text
    arcpy.AddMessage(agreement)
    

    The root.find() call uses an XPath expression (quick cheatsheet is in the Python docs here) to find the first tag at any level under the current level named agreementid. If there are multiple tags named that in your file, you can use root.findall() and iterate over the results. If, for example, there are three fields named agreementid, and you know you want the second one, then root.findall(".//agreementid")[1] should work.