Search code examples
python-3.xbeautifulsouppubmed

Automating efetch does not return an xml file


Keywords: Entrez NCBI PubMed Python3.7 BeautifulSoup xml

I would like to retrieve some xml data from a list of Pubmed Ids. When I use the url provided as an example on the Entrez website (https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=10890170&retmode=xml), the data is downloaded correctly as a xml file, but if I am to automate the search by replacing the id with a variable (temp_id), text is returned and not a xml file.

I therefore get this error (because there is not xml file with xml tags)

Traceback (most recent call last): File "test.py", line 14, in pub_doi = soup.find(idtype="doi").text AttributeError: 'NoneType' object has no attribute 'text'

from bs4 import BeautifulSoup
import certifi
import urllib3
temp_id=str(10890170)
#efetch_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=10890170&retmode=xml'#this url works

base_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'
efetch_url = '%sefetch.fcgi?db=pubmed&id=%s&retmode=xml' % (base_url, temp_id)
try:
    http = urllib3.PoolManager()
    http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
    url = efetch_url
    results = http.request('GET', url)
    soup = BeautifulSoup(results.data,features='xml')
    pub_doi = soup.find(idtype="doi").text
    pub_abstract = soup.pubmedarticleset.pubmedarticle.article.abstract.abstracttext.text
except (urllib3.exceptions.HTTPError, IOError) as e:
    print("ERROR!", e)
else:
    pass

For some reason, when I copy and paste the url in my browser, it appears as text in safari, and xml in chrome.

I would like to get some help as I suspect my url is not constructed well.


Solution

  • Turns out it was an issue in the way Beautiful Soup handled the url link. I used ElementTree instead and it worked.