Search code examples
bioinformaticsbiopython

Can Biopython Entrez pull full Pubmed articles from a list of PMIDs?


I've been reading documentation and testing Entrez functions for the last 2 days, and I have it working so that it pulls Abstracts just fine from PMIDs.

But I can't find a clear yes/no answer on if Entrez can pull a text version of the Full Article body, instead of just the abstract field.

I think I might be missing something on the XML parsing, and just need a little clarification because I haven't been able to find it in the documentation. Thanks very much for any assistance.


Solution

  • It cannot extract the full article text (or pdf). You could try to download a pdf through metapub. If you want just text, you can extract it via textract.

    import metapub
    from urllib.request import urlretrieve
    import textract
    
    pmid = '20147967'
    
    url = metapub.FindIt(pmid).url
    
    urlretrieve(url, any_path)
    
    with open(another_path, "w") as textfile:
        textfile.write(textract.process(
            any_path,
            extension='pdf',
            method='pdftotext',
            encoding="utf_8",
        ))