Search code examples
python-3.xbiopythonpubmed

Biopython and retrieving journal's full name


I am using Biopython with Python 3.x to conduct searches from PubMed-database. I get the search results correctly, but next I would need to extract all the journal names (full names, not just abbreviations) of the search results. Currently I am using the following code:

from Bio import Entrez
from Bio import Medline

Entrez.email = "my_email@gmail.com"
handle = Entrez.esearch(db="pubmed", term="search_term", retmax=20)
record = Entrez.read(handle)
handle.close()

idlist = record["IdList"]

records = list(records)

for record in records:
    print("source:", record.get("SO", "?"))

So this works fine, but record.get("SO"), "?") returns only the abbreviation of the journal (for example, N Engl J Med, not New England Journal of Medicine). From my experiences with manual PubMed-searches, you can search using both the abbreviation or the full name, and PubMed will handle those in the same way, so I figured if there is also some parameter to get the full name?


Solution

  • So this works fine, but record.get("SO"), "?") returns only the abbreviation of the journal

    No it doesn't. It won't even run due to this line:

    records = list(records)
    

    as records isn't defined. And even if you fix that, all you get back from:

    idlist = record["IdList"]
    

    is a list of numbers like: ['17510654', '2246389'] that are intended to be passed back via an Entrez.efetch() call to get the actual data. So when you do record.get("SO", "?") on one of these number strings, your code blows up (again).

    First, the "SO" field abbreviation is defined to return Journal Title Abbreviation (TA) as part of what it returns. You likely want "JT" Journal Title instead as defined in MEDLINE/PubMed Data Element (Field) Descriptions. But neither of these has anything to do with this lookup.

    Here's a rework of your code to get the article title and the title of the journal that it's in:

    from Bio import Entrez
    
    Entrez.email = "my_email@gmail.com"  # change this to be your email address
    handle = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmax=20)
    record = Entrez.read(handle)
    handle.close()
    
    for identifier in record['IdList']:
        pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
        result = Entrez.read(pubmed_entry)
        article = result['PubmedArticle'][0]['MedlineCitation']['Article']
    
        print('"{}" in "{}"'.format(article['ArticleTitle'], article['Journal']['Title']))
    

    OUTPUT

    > python3 test.py
    "Of wombats and whales: telomere tales in Madrid. Conference on telomeres and telomerase." in "EMBO reports"
    "Spontaneous proliferations in Australian marsupials--a survey and review. 1. Macropods, koalas, wombats, possums and gliders." in "Journal of comparative pathology"
    >
    

    Details can be found in the document: MEDLINE PubMed XML Element Descriptions