Search code examples
pythonxmlparsingminidom

Parsing xml using import minidom with if conditions


 <Game:quit>

           <Game:AnimalSet AnimalSet="Name" />
           <Game:Value Value="Lion" />

       </Game:quit>
       <Game:quit>

           <Game:AnimalSet AnimalSet="Name" />
           <Game:Value Value="Tiger" />

       </Game:quit>
       <Game:quit>

           <Game:AnimalSet AnimalSet="Name" />
           <Game:Value Value="Leopard" />

       </Game:quit>
       <Game:quit>

           <Game:DimensionSet AnimalSet="Name" />
           <Game:Value Value="Elephant" />

       </Game:quit>

   <Game:quit>

          <Game:AnimalSet AnimalSet="Place" />
          <Game:Value Value="USA" />

This is the chunk of xml from my sample.xml that I am mainly concerned of. I want to parse this xml in such a way using the from xml.dom import minidom import library, that I run if conditions and if the AnimalSet value is "Name" it will store/append its values which are "Lion", "Tiger", "Leopard" and "elephant" in some list. elif if the AnimalSet is "Place", it should store/append "USA" into another list.

I am stuck at the code at the beginning only, so would really appreciate if someone helps me starting on it.

Any help? Please raise questions if still anything is not clear. Thanks


Solution

  • This looks like a mission for XPath, so as an alternative to minidom, you may use ElementTree, whose findall method can find all elements with a Value attribute.

    import xml.etree.ElementTree as ET
    
    doc = ET.parse(path_to_xml_file)
    values = doc.findall('.//*[@Value]')
    print [value.get('Value') for value in values]
    

    For Python 2.6.6, ElementTree is unable to look for attributes, so you must use something else. There must be a xmlns:Game pseudo-attribute at the beginning of the file; copy its value in a xmlns variable, and try the following.

    import xml.etree.ElementTree as ET
    
    xmlns =  # the value of xmlns:Game
    doc = ET.parse(path)
    values = doc.findall('.//Game:Value', namespaces={'Game': xmlns})
    print [value.get('Value') for value in values]