Search code examples
python-3.xxmlelementtreeminidom

Python: Getting parent attribute from child attribute in xml


I have an XML area.xml

<area>
<controls>
    <internal>yes</internal>
</controls>
<schools>
    <school id="001"/>
    <time>2020-05-18T14:21:00Z</time>
    <venture index="5">
        <venture>
            <basicData type="class">
                <wage numberOfDollars="13" Correction="4.61">
                    <tax>70</tax>
                </wage>
            </basicData>
        </venture>
    </venture>
    <venture index="9">
        <venture>
            <basicData type="class">
                <wage numberOfDollars="13" Correction="5.61">
                    <tax>70</tax>
                </wage>
            </basicData>
        </venture>
    </venture>
    <school id="056"/>
    <time>2020-05-18T14:21:00Z</time>
    <venture index="5">
        <venture>
            <basicData type="class">
                <wage numberOfDollars="13">
                    <tax>70</tax>
                </wage>
            </basicData>
        </venture>
    </venture>
    <venture index="9">
        <venture>
            <basicData type="class">
                <wage numberOfDollars="13">
                    <tax>70</tax>
                </wage>
            </basicData>
        </venture>
    </venture>
</schools>

What i am trying to achieve with Python: in a school node there are multiple wage nodes(leaves). if a wage node(leave)(1 or more) has an attribute called Correction i want the attribute value of the school node.

So the outcome of my script should be: 001 because this school has the attribute Correction in the wage node(leave)

First i tried it using ETree

import xml.etree.ElementTree as ET
data_file = 'area.xml'
tree = ET.parse(data_file)
root = tree.getroot()


t1 = "school"
t2 = "wage"

for e1, e2 in zip(root.iter(t1), root.iter(t2)):
    if hasattr(e2,'Correction'):
        e2.Correction
        print (e1.attrib['id'])

but that didn't work. Now I am trying to reach my goal using minidom but I find it quite hard.

This is my code so far:

from xml.dom import minidom

doc = minidom.parse("area.xml")

staffs = doc.getElementsByTagName("wage")
for wage in staffs:
        sid = wage.getAttribute("Correction")

        print("wage:%s" %
              (sid))

the output gives all values of the wage attribute Correction:

wage:4.61
wage:5.61
wage:
wage:

Which is obviously far from correct.

i could use some help getting me in the right direction

i am using python 3

thank you in advance


Solution

  • in a school node there are multiple wage nodes

    Not really. The school elements are empty. The venture siblings have the wage descendants. Since wage is not a descendant of school, this makes it a little tricky to select the corresponding school.

    If you can use lxml you could use XPath to select the wage elements that have a Correction attribute and then select the first preceding school element and get its id attribute...

    from lxml import etree
    
    tree = etree.parse("area.xml")
    
    schools_with_corrected_wages = set()
    
    for corrected_wage in tree.xpath(".//wage[@Correction]"):
        schools_with_corrected_wages.add(corrected_wage.xpath("preceding::school[1]/@id")[0])
    
    print(schools_with_corrected_wages)
    

    This prints:

    {'001'}
    

    You could also use lxml to process the XML with XSLT...

    XSLT 1.0 (test.xsl)

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="text"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:key name="corrected_wage_by_school" match="wage[@Correction]" use="preceding::school[1]/@id"/>
    
      <xsl:template match="/">
        <xsl:for-each select="//school[key('corrected_wage_by_school',@id)]">
          <xsl:value-of select="concat(@id,'&#xA;')"/>
        </xsl:for-each>
      </xsl:template>
    
    </xsl:stylesheet>
    

    Python

    from lxml import etree
    
    tree = etree.parse("area.xml")        
    xslt = etree.parse("test.xsl")
    result = tree.xslt(xslt)
    
    print(result)
    

    This prints...

    001