I've been reading documentation and testing Entrez functions for the last 2 days, and I have it working so that it pulls Abstracts just fine from PMIDs.
But I can't find a clear yes/no answer on if Entrez can pull a text version of the Full Article body, instead of just the abstract field.
I think I might be missing something on the XML parsing, and just need a little clarification because I haven't been able to find it in the documentation. Thanks very much for any assistance.
It cannot extract the full article text (or pdf). You could try to download a pdf through metapub
. If you want just text, you can extract it via textract
.
import metapub
from urllib.request import urlretrieve
import textract
pmid = '20147967'
url = metapub.FindIt(pmid).url
urlretrieve(url, any_path)
with open(another_path, "w") as textfile:
textfile.write(textract.process(
any_path,
extension='pdf',
method='pdftotext',
encoding="utf_8",
))