Search code examples
pythonxml-parsingurl-parsing

XML parsing - findall() list comes up empty


Stuck on an assignment dealing with URL and XML parsing. I've got the data out but can't seem to get findall() to work. I know that once I can get findall() to work I'll have a list to loop through to. Any insight would be great and hoping to get a gentle nudge versus an outright answer if possible. Thank you!

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
fhand = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

raw_data = fhand.read().decode()
xml_data = ET.fromstring(raw_data)
lst = xml_data.findall('name')
print(lst)

Solution

  • findall is not recursive, meaning it will not find a node/element if it is not directly under the element you called findall on (if not using xpath, that is).

    Instead, use iter:

    import urllib.request
    import xml.etree.ElementTree as ET
    
    fhand = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
    
    raw_data = fhand.read().decode()
    xml_data = ET.fromstring(raw_data)
    for name_node in xml_data.iter('name'):
        print(name_node.text)
    

    or findall with xpath:

    xml_data.findall('comments/comment/name')
    

    Both will output

    Romina
    Laurie
    Bayli
    Siyona
    Taisha
    Alanda
    Ameelia
    Prasheeta
    Asif
    Risa
    Zi
    Danyil
    Ediomi
    Barry
    Lance
    Hattie
    Mathu
    Bowie
    Samara
    Uchenna
    Shauni
    Georgia
    Rivan
    Kenan
    Hassan
    Isma
    Samanthalee
    Alexa
    Caine
    Grady
    Anne
    Rihan
    Alexei
    Indie
    Rhuairidh
    Annoushka
    Kenzi
    Shahd
    Irvine
    Carys
    Skye
    Atiya
    Rohan
    Nuala
    Maram
    Carlo
    Japleen
    Breeanna
    Zaaine
    Inika