Search code examples
pythonxmlpython-3.xxml-parsingautosar

Parsing XML in python and deletion of containers


I'm trying to write a Python script that will go through the file and remove the container of a particular node attribute. For instance, my tree looks like:

<collection shelf="New Arrivals">
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
</collection>

Q1

The whole container should be removed if the attribute of the child node <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF"> equals : /AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport

The script I have written is :

import xml.etree.ElementTree as ET
tree = ET.parse('autosar1.xml')
root = tree.getroot()
for child in root.findall(".//ECUC-NUMERICAL-PARAM-VALUE"):
    for z in child.findall(".//DEFINITION-REF[@DEST='ECUC-BOOLEAN-PARAM-DEF']"):
        if z.text == "/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport":
            child.remove(z)         
tree.write('output.xml')

But I am not getting the intended results. The result I am getting is:

<collection shelf="New Arrivals">
<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>
</collection>

The result I want to get :

<collection shelf="New Arrivals">
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
</collection>

Q2

Instead of hardcoding the node attribute in the if condition, is it possible that by taking user input (in command prompt maybe),suppose as "ComIPduCancellationSupport", (not the whole attribute as "/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport"),the desired output is achieved.

Thanks a lot.


Solution

  • Consider the third-party, lxml, the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. You can install with pip or binary file for Windows. The reason for recommendation is the module can run full W3C conformant XPath 1.0 and XSLT 1.0 where the latter XSLT is useful for you.

    XSLT is a special-purpose language that can transform XML files like removing nodes conditionally. Specifically in XSLT, we run the Identity Transform (to copy entire document as is) and then run an empty template on the node we intend to remove. Notice the use of contains() to check for a string anywhere in the text of that node. No for loop or if logic needed for this approach.

    And with Python's lxml we can build a dynamic XSLT script (which by the way is an XML file) from string and pass a string such as COMPU-METHOD-REF into contains(). Such a string can derive from user input. Notice the {0} placeholder for string .format().

    Python

    import lxml.etree as et
    doc = et.parse('Input.xml')
    
    xsl_str='''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                                             xmlns:doc="http://autosar.org/3.0.2">
      <xsl:output indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <!-- IDENTITY TRANSFORM -->
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
    
      <!-- EMPTY TEMPLATE -->
      <xsl:template match="INTEGER-TYPE[descendant::COMPU-METHOD-REF/@DEST='COMPU-METHOD' and 
                                        contains(descendant::COMPU-METHOD-REF, '{0}')]">    
      </xsl:template>
    
    </xsl:stylesheet>'''
    
    # LOAD DYNAMIC XSL STRING (PASSING BELOW STRING INTO ABOVE)
    xsl = et.fromstring(xsl_str.format('CoolantTemp_T'))
    
    transform = et.XSLT(xsl)
    result = transform(doc)
    
    # OUTPUT TO SCREEN
    print(result)    
    # OUTPUT TO FILE
    with open('output.xml', 'wb') as f:
        f.write(result)
    

    Output

    <?xml version="1.0"?>
    <TOP-LEVEL-PACKAGES>
      <AR-PACKAGE>
        <SHORT-NAME>DataType</SHORT-NAME>
        <ELEMENTS>
          <INTEGER-TYPE>
            <SHORT-NAME>EngineSpeed_T</SHORT-NAME>
            <SW-DATA-DEF-PROPS>
              <COMPU-METHOD-REF DEST="COMPU-METHOD">/DataType/DataTypeSemantics/EngineSpeed_T</COMPU-METHOD-REF>
            </SW-DATA-DEF-PROPS>
            <LOWER-LIMIT INTERVAL-TYPE="CLOSED">0</LOWER-LIMIT>
            <UPPER-LIMIT INTERVAL-TYPE="CLOSED">65535</UPPER-LIMIT>
          </INTEGER-TYPE>
          <INTEGER-TYPE>
            <SHORT-NAME>VehicleSpeed_T</SHORT-NAME>
            <SW-DATA-DEF-PROPS>
              <COMPU-METHOD-REF DEST="COMPU-METHOD">/DataType/DataTypeSemantics/VehicleSpeed_T</COMPU-METHOD-REF>
            </SW-DATA-DEF-PROPS>
            <LOWER-LIMIT INTERVAL-TYPE="CLOSED">0</LOWER-LIMIT>
            <UPPER-LIMIT INTERVAL-TYPE="CLOSED">65535</UPPER-LIMIT>
          </INTEGER-TYPE>
          <INTEGER-TYPE>
            <SHORT-NAME>Percent_T</SHORT-NAME>
            <SW-DATA-DEF-PROPS>
              <COMPU-METHOD-REF DEST="COMPU-METHOD">/DataType/DataTypeSemantics/Percent_T</COMPU-METHOD-REF>
            </SW-DATA-DEF-PROPS>
            <LOWER-LIMIT INTERVAL-TYPE="CLOSED">0</LOWER-LIMIT>
            <UPPER-LIMIT INTERVAL-TYPE="CLOSED">255</UPPER-LIMIT>
          </INTEGER-TYPE>
        </ELEMENTS>
        <SUB-PACKAGES>
          <AR-PACKAGE>
            <SHORT-NAME>DataTypeSemantics</SHORT-NAME>
            <ELEMENTS>
              <COMPU-METHOD>
                <SHORT-NAME>EngineSpeed_T</SHORT-NAME>
                <UNIT-REF DEST="UNIT">/DataType/DataTypeUnits/rpm</UNIT-REF>
                <COMPU-INTERNAL-TO-PHYS>
                  <COMPU-SCALES>
                    <COMPU-SCALE>
                      <COMPU-RATIONAL-COEFFS>
                        <COMPU-NUMERATOR>
                          <V>0</V>
                          <V>1</V>
                        </COMPU-NUMERATOR>
                        <COMPU-DENOMINATOR>
                          <V>8</V>
                        </COMPU-DENOMINATOR>
                      </COMPU-RATIONAL-COEFFS>
                    </COMPU-SCALE>
                  </COMPU-SCALES>
                </COMPU-INTERNAL-TO-PHYS>
              </COMPU-METHOD>
              <COMPU-METHOD>
                <SHORT-NAME>VehicleSpeed_T</SHORT-NAME>
                <UNIT-REF DEST="UNIT">/DataType/DataTypeUnits/kph</UNIT-REF>
                <COMPU-INTERNAL-TO-PHYS>
                  <COMPU-SCALES>
                    <COMPU-SCALE>
                      <COMPU-RATIONAL-COEFFS>
                        <COMPU-NUMERATOR>
                          <V>0</V>
                          <V>1</V>
                        </COMPU-NUMERATOR>
                        <COMPU-DENOMINATOR>
                          <V>64</V>
                        </COMPU-DENOMINATOR>
                      </COMPU-RATIONAL-COEFFS>
                    </COMPU-SCALE>
                  </COMPU-SCALES>
                </COMPU-INTERNAL-TO-PHYS>
              </COMPU-METHOD>
              <COMPU-METHOD>
                <SHORT-NAME>Percent_T</SHORT-NAME>
                <UNIT-REF DEST="UNIT">/DataType/DataTypeUnits/Percent</UNIT-REF>
                <COMPU-INTERNAL-TO-PHYS>
                  <COMPU-SCALES>
                    <COMPU-SCALE>
                      <COMPU-RATIONAL-COEFFS>
                        <COMPU-NUMERATOR>
                          <V>0</V>
                          <V>0.4</V>
                        </COMPU-NUMERATOR>
                        <COMPU-DENOMINATOR>
                          <V>1</V>
                        </COMPU-DENOMINATOR>
                      </COMPU-RATIONAL-COEFFS>
                    </COMPU-SCALE>
                  </COMPU-SCALES>
                </COMPU-INTERNAL-TO-PHYS>
              </COMPU-METHOD>
              <COMPU-METHOD>
                <SHORT-NAME>CoolantTemp_T</SHORT-NAME>
                <UNIT-REF DEST="UNIT">/DataType/DataTypeUnits/DegreeC</UNIT-REF>
                <COMPU-INTERNAL-TO-PHYS>
                  <COMPU-SCALES>
                    <COMPU-SCALE>
                      <COMPU-RATIONAL-COEFFS>
                        <COMPU-NUMERATOR>
                          <V>-40</V>
                          <V>1</V>
                        </COMPU-NUMERATOR>
                        <COMPU-DENOMINATOR>
                          <V>2</V>
                        </COMPU-DENOMINATOR>
                      </COMPU-RATIONAL-COEFFS>
                    </COMPU-SCALE>
                  </COMPU-SCALES>
                </COMPU-INTERNAL-TO-PHYS>
              </COMPU-METHOD>
            </ELEMENTS>
          </AR-PACKAGE>
          <AR-PACKAGE>
            <SHORT-NAME>DataTypeUnits</SHORT-NAME>
            <ELEMENTS>
              <UNIT>
                <SHORT-NAME>rpm</SHORT-NAME>
                <DISPLAY-NAME>rpm</DISPLAY-NAME>
              </UNIT>
              <UNIT>
                <SHORT-NAME>kph</SHORT-NAME>
                <DISPLAY-NAME>kph</DISPLAY-NAME>
              </UNIT>
              <UNIT>
                <SHORT-NAME>Percent</SHORT-NAME>
                <DISPLAY-NAME>Percent</DISPLAY-NAME>
              </UNIT>
              <UNIT>
                <SHORT-NAME>DegreeC</SHORT-NAME>
                <DISPLAY-NAME>DegreeC</DISPLAY-NAME>
              </UNIT>
            </ELEMENTS>
          </AR-PACKAGE>
        </SUB-PACKAGES>
      </AR-PACKAGE>
    </TOP-LEVEL-PACKAGES>