Search code examples
pythonbioinformaticsbiopythonpubmed

How to automatically save PMC full text to disk through BioPython?


Is there a way to download and save the XML files that the Entrez module returns to local disk? What I am currently doing is :

fetch = Entrez.efetch(db='pmc',
                     resetmode='xml',
                     id=ids,
                     rettype='full')
article =  fetch.read()

Then saving article which is a str object as an xml file through Python's write function.

Does BioPython provide a way to automatically download the files onto the disk?


Solution

  • I don't think Biopython provides a way to do this, but it doesn't need to as you can do this without first saving to a string:

    fetch = Entrez.efetch(db='pmc',
                     resetmode='xml',
                     id=ids,
                     rettype='full')
    
    with open('fileNameToSave.xml', 'w') as f:
        f.write(fetch.read())
    

    An alternative approach, as Chris_Rands points out in his comment, is to get a file directly via a URL:

    https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=15718680