I am trying to parse XML and am hard time having. I dont understand why the results keep printing [<Element 'Results' at 0x105fc6110>]
I am trying to extract Social
from my example with the
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
results = root.findall("Results")
print results #[<Element 'Results' at 0x105fc6110>]
# WHAT IS THIS??
for result in results:
print result.find("Social") #None
the XML looks like this:
<?xml version="1.0"?>
<List1>
<NextOffset>AAA</NextOffset>
<Results>
<R>
<D>internet.com</D>
<META>
<Social>
<v>http://twitter.com/internet</v>
<v>http://facebook.com/internet</v>
</Social>
<Telephones>
<v>+1-555-555-6767</v>
</Telephones>
</META>
</R>
</Results>
</List1>
findall
returns a list
of xml.etree.ElementTree.Element
objects. In your case, you only have 1 Result
node, so you could use find
to look for the first/unique match.
Once you got it, you have to use find
using the .//
syntax which allows to search in anywhere in the tree, not only the one directly under Result
.
Once you found it, just findall
on v
tag and print the text:
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
result = root.find("Results")
social = result.find(".//Social")
for r in social.findall("v"):
print(r.text)
results in:
http://twitter.com/internet
http://facebook.com/internet
note that I did not perform validity check on the xml file. You should check if the find
method returns None
and handle the error accordignly.
Note that even though I'm not confident myself with xml format, I learned all that I know on parsing it by following this lxml tutorial.