Search code examples
pythonxmlattributesminidom

Python read xml list by key-value


I am trying to read a quote list via Python. The list looks like this:

<quotelist
    xmlns="http://www.w3schools.com"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="quotationlist.xsd">
    <quote key = "0">
        <author>Author 0</author>
        <text>Text 0</text>
    </quote>
    <quote key = "1">
        <author>Author 1</author>
        <text>Text 1.</text>
    </quote>
    <quote key = "2">
        <author>Author 2</author>
        <text>Text 2.</text>
    </quote>
</quotelist>

I would like to have this as one day one quote, so therefore the key is the day of the year (0 to 364). But I struggle to read out day x with Python.

from xml.dom import minidom
dayOfYear = 44 #not relevant, I know how to find this out
mydoc = minidom.parse('./media/quotes.xml')
items = mydoc.getElementsByTagName('quote')
print(items)

This gives me the list of 365 quotes in format , thats what I excepted. But function is there to find the quote with the key number "dayOfYear"? Is there a way of not loading all? And how do I get the values of author and text then?


Solution

  • You'll have to build that data structure on your own. In this case I chose a nested dict:

    items = mydoc.getElementsByTagName('quote')
    output = {int(item.getAttribute('key')): {'author': item.getElementsByTagName('author')[0].firstChild.nodeValue,
                                              'text': item.getElementsByTagName('text')[0].firstChild.nodeValue}
              for item in items}
    
    print(output)
    

    Outputs

    {0: {'author': 'Author 0',
         'text': 'Text 0'},
     1: {'author': 'Author 1',
         'text': 'Text 1'},
     2: {'author': 'Author 2',
         'text': 'Text 2'}}
    

    Then you can directly access each "day" that you want, eg output[0], output[1] etc.