Search code examples
pythonxmllxmlelementtreeminidom

Basic Python Parsing XML with xml.etree - Issue


I am trying to parse XML and am hard time having. I dont understand why the results keep printing [<Element 'Results' at 0x105fc6110>] I am trying to extract Social from my example with the

import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
results = root.findall("Results")
print results #[<Element 'Results' at 0x105fc6110>]
              # WHAT IS THIS??


for result in results:
    print result.find("Social") #None

the XML looks like this:

<?xml version="1.0"?>
<List1>
    <NextOffset>AAA</NextOffset>
    <Results>
        <R>
            <D>internet.com</D>
            <META>
                <Social>
                    <v>http://twitter.com/internet</v>
                    <v>http://facebook.com/internet</v>
                </Social>
                <Telephones>
                    <v>+1-555-555-6767</v>
                </Telephones>
            </META>
        </R>
    </Results>
</List1>

Solution

  • findall returns a list of xml.etree.ElementTree.Element objects. In your case, you only have 1 Result node, so you could use find to look for the first/unique match.

    Once you got it, you have to use find using the .// syntax which allows to search in anywhere in the tree, not only the one directly under Result.

    Once you found it, just findall on v tag and print the text:

    import xml.etree.ElementTree as ET
    root = ET.parse("test.xml")
    result = root.find("Results")
    
    social = result.find(".//Social")
    
    for r in social.findall("v"):
        print(r.text)
    

    results in:

    http://twitter.com/internet
    http://facebook.com/internet
    

    note that I did not perform validity check on the xml file. You should check if the find method returns None and handle the error accordignly.

    Note that even though I'm not confident myself with xml format, I learned all that I know on parsing it by following this lxml tutorial.