Search code examples
pythonbeautifulsoupxml-parsinghtml-parsing

Parsing Autosar arxml using beautiful soup or any other method in python


I am working on autosar files, In Autosar we are using .arxml files, In below arxml file I want pars some datas(DTC value ex:112068)

.arxml:

  <ECUC-CONTAINER-VALUE>
   <SHORT-NAME>DTC_AD</SHORT-NAME>
   <DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass</DEFINITION-REF>
   <PARAMETER-VALUES>
    <ECUC-NUMERICAL-PARAM-VALUE>
     <DEFINITION-REF DEST="ECUC-INTEGER-PARAM-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTC</DEFINITION-REF>
     <VALUE>112068</VALUE>
    </ECUC-NUMERICAL-PARAM-VALUE>
    <ECUC-TEXTUAL-PARAM-VALUE>
     <DEFINITION-REF DEST="ECUC-STRING-PARAM-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTCDescription</DEFINITION-REF>
     <VALUE>AD temp</VALUE>
    </ECUC-TEXTUAL-PARAM-VALUE>
    <ECUC-NUMERICAL-PARAM-VALUE>
     <DEFINITION-REF DEST="ECUC-INTEGER-PARAM-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTCFunctionalUnit</DEFINITION-REF>
     <VALUE>1</VALUE>
    </ECUC-NUMERICAL-PARAM-VALUE>
   </PARAMETER-VALUES>
  </ECUC-CONTAINER-VALUE>
  <ECUC-CONTAINER-VALUE>
   <SHORT-NAME>DTC_Lost</SHORT-NAME>
   <DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass</DEFINITION-REF>
   <PARAMETER-VALUES>
    <ECUC-NUMERICAL-PARAM-VALUE>
     <DEFINITION-REF DEST="ECUC-INTEGER-PARAM-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTC</DEFINITION-REF>
     <VALUE>126630</VALUE>
    </ECUC-NUMERICAL-PARAM-VALUE>
    <ECUC-TEXTUAL-PARAM-VALUE>
     <DEFINITION-REF DEST="ECUC-STRING-PARAM-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTCDescription</DEFINITION-REF>
     <VALUE>LostCOMM</VALUE>
    </ECUC-TEXTUAL-PARAM-VALUE>
    <ECUC-NUMERICAL-PARAM-VALUE>
     <DEFINITION-REF DEST="ECUC-INTEGER-PARAM-DEF">/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTCFunctionalUnit</DEFINITION-REF>
     <VALUE>1</VALUE>
    </ECUC-NUMERICAL-PARAM-VALUE>
   </PARAMETER-VALUES>
  </ECUC-CONTAINER-VALUE>

I have tried below code also but I did not get the desired output :

from bs4 import BeautifulSoup as Soup

def diff_method():
    handler = open('Dem_PRJ_8CH_EcucValues.arxml').read()
    soup = Soup(handler,"html.parser")  
    for ecuc_container in soup.find_all('ecuc-container-value'):
        for def_ref in ecuc_container.find_all('definition-ref'):
            #print(def_ref.get_text())
            if (def_ref.get_text() == '/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTC'):
                print(ecuc_container.get_text())

if __name__ == "__main__":

    diff_method()

expected output :

112068
126630

Solution

  • If in handler variable is your XML text from the question, you can use this example to get values from <value> tags:

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(handler, 'html.parser')
    
    for definition in soup.select('definition-ref:has(~ value)'):
        if definition.get_text(strip=True) == '/AUTOSAR_Dem/EcucModuleDefs/Dem/DemConfigSet/DemDTCClass/DemDTC':
            print(definition.find_next('value').text)
    

    Prints:

    112068
    126630