Search code examples
pythonxmlminidom

In Python, how can I inspect a specific section of XML and extract the node text?


I'm using minidom to inspect XML which contains a list of debug key listings. An example of the XML is as follows:

<Shortcuts>
  <Item>
    <CommandName>DebugCommandName_1</CommandName>
    <ShortcutKeys>
      <Item>
        <Keys>
          <Item>KEY_1</Item>
          <Item>KEY_2</Item>
        </Keys>
      </Item>
    </ShortcutKeys>
  </Item>
...
  <Item>
    <CommandName>DebugCommandName_2</CommandName>
    <ShortcutKeys>
      <Item>
        <Keys>
          <Item>KEY_3</Item>
        </Keys>
      </Item>
      <Item>
        <Keys>
          <Item>KEY_4</Item>
        </Keys>
      </Item>
    </ShortcutKeys>
  </Item>
</Shortcuts>

For reasons beyond my control, I will not be able to demand the format of the incoming XML is changed to be more consistent, so I must account for both layouts of the ShortcutKeys sections of the document, as well as the multiple Item child elements all over the place.

Parsing the XML with minidom, I then use the following Python to extract content:

for item in parsedKeyComboFile.getElementsByTagName("Item"):
if (item.getElementsByTagName("CommandName").length > 0): 
    commandName = item.getElementsByTagName("CommandName")[0].childNodes[0].nodeValue
    print(commandName)
elif (item.getElementsByTagName("Keys").length > 0):
    keyCombo = item.getElementsByTagName("Item")[0].childNodes[0].nodeValue
    print(keyCombo)

I'll eventually be adding this info to dictionaries, but for now the print out of the above XML I get is:

DebugCommandName_1
KEY_1
DebugCommandName_2
KEY_3
KEY_4

when what I desire is:

DebugCommandName_1
KEY_1 KEY_2
DebugCommandName_2
KEY_3 KEY_4

(I realise I'm not properly formatting the print of the keys to achieve the single line output. They key thing here is not skipping over the KEY_2 Item.)

I know that the [0] in the keyCombo= line limits me to the first occurence of Item in Keys.

So, is there a way for me to inspect a top level Item and all its child elements, pulling out the single CommandName and all of the Keys Items inside that top-level Item, before then moving on to the next top level Item and repeating the process? I have utterly failed to achieve this so far.

Should I be using ElementTree?

Many thanks.


Solution

  • I've no experience with minidom, and by recommendation

    It's use is not recommended, you probably want to use xml.etree.ElementTree instead.

    -- from the minidom tag info

    If you can use xml.etree.ElementTree instead, this may be a straightforward way:

    import xml.etree.ElementTree as ET
    tree = ET.parse('example.xml')
    root = tree.getroot()  # unused variable in this example
    
    for elem in tree.iter():
        if elem.tag == 'CommandName':
            print(elem.text)
        if elem.tag == 'Keys': 
            for item in elem:
                print(item.text)
    

    Prints

    DebugCommandName_1
    KEY_1
    KEY_2
    DebugCommandName_2
    KEY_3
    KEY_4
    

    Or if you want lists for each <Keys> tag:

    if elem.tag == 'Keys':
        print([item.text for item in elem])
    

    Prints:

    DebugCommandName_1
    ['KEY_1', 'KEY_2']
    DebugCommandName_2
    ['KEY_3']
    ['KEY_4']