Search code examples
pythonxmlpython-3.xxml-parsingpubmed

Parsing XML: Finding Interesting Elements Using ElementTree


I am using urllib and ElementTree to parse XML API calls from pubmed.

An example of this is:

#Imports Modules that can send requests to URLs 
#Python Version 3.4 Using IEP (Interactive Editor for Python) as IDE  
import urllib.request 
import urllib.parse 
import re 
import xml.etree.ElementTree as ET 
from urllib import request 

#Obtain API Call and assign Element Object to Root
id_request = urllib.request.urlopen('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1757056')
id_pubmed = id_request.read()
root = ET.fromstring(id_pubmed)

I now have been able to use Element Tree to import the data to the object root from ET.fromstring. My issue now, is that I am having trouble finding interesting elements from this object.

I am referring to: https://docs.python.org/2/library/xml.etree.elementtree.html and my XML format looks like: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1757056

I have tried:

#Parse Attempts.  Nothing returned.
for author in root.iter('Author'):
   print (author.attrib)

As well as

#No Return for author
for author in root.findall('Id'):
   author = author.find('author').text
   print (author)

Solution

  • Try to iterate by the tag

    for author in root.iter('Item'):
        if author.attrib['Name'] == 'Author':
        print("Success") 
    

    Or:

    author_list = [x for x in root.iter('Item') if x.attrib['Name'] == 'Author']
    

    I don't know if you can iterate by attribute