Search code examples
pythonxmlminidom

python xml.dom - access keys that are child of another key


I am trying to access data from a xml file made as following

<datafile>
    <header>
        <name>catalogue</name>
        <description>the description</description>
    </header>
    <item name="jack">
        <description>the headhunter</description>
        <year>1981</year>
    </item>
    <item name="joe">
        <description>the butler</description>
        <year>1995</year>
    </item>
    <item name="david">
        <description>guest</description>
        <year>2000</year>
    </item>
</datafile>

I would like to parse all the name tags, and when that match, I would like to retrieve the description. So far I can retrieve all the item, and I can print out the name field, but I can't find a way to access the sub-tag description and year.

from xml.dom import minidom

xmldoc = minidom.parse("myfile.xml")
# This does retrieve all the item elements 
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
# This does print the name of the first element
print(itemlist[0].attributes['name'].value)
# This give me a key error, although I can see that the child element 1 of itemlist is the description
print(itemlist[1].attributes['description'].value)

I am not sure how to access the sub-elements, since they are children of the item element; do I need to create another itemlist from the item element list to retrieve the description key and access its value? Or am I totally off?


Solution

  • Here's a way to extract the data. Not sure it's the most elegant one, but it works:

    for item in xmldoc.getElementsByTagName("item"):
        name = item.attributes.getNamedItem("name").value
        print(f"name is {name}") 
        desc = item.getElementsByTagName("description")[0].childNodes[0].data
        print(f"description is {desc}")
    

    The output is:

    name is jack
    description is the headhunter
    name is joe
    description is the butler
    name is david
    description is guest
    

    Note that the documentation of minidom is, well, kind of lacking. But, it (mostly) implements the DOM standard - see documentation here.