Search code examples
pythonbiopythonncbipubmed

Does Bio.Entrez's efetch() retrieve all metadata of a PubMed article?


I wonder whether Bio.Entrez's efetch() retrieves all metadata of a PubMed article, given a PMID as input. By all metadata, I mean whether PubMed has any more metadata than what efetch() retrieves.


For example, I see that for the PMID 23954024, efetch() retrieves an abstract that contains a bit less information than the abstract on PubMed's website (http://www.ncbi.nlm.nih.gov/pubmed/23954024):

efetch():

"AbstractText": [
    "Rotator cuff tendinopathy is a common source of shoulder pain characterised by persistent and/or recurrent problems for a proportion of sufferers. The aim of this study was to pilot the methods proposed to conduct a substantive study to evaluate the effectiveness of a self-managed loaded exercise programme versus usual physiotherapy treatment for rotator cuff tendinopathy.", 
    "A single-centre pragmatic unblinded parallel group pilot randomised controlled trial.", 
    "One private physiotherapy clinic, northern England.", 
    "Twenty-four participants with rotator cuff tendinopathy.", 
    "The intervention was a programme of self-managed loaded exercise. The control group received usual physiotherapy treatment.", 
    "Baseline assessment comprised the Shoulder Pain and Disability Index (SPADI) and the Short-Form 36, repeated three months post randomisation.", 
    "The recruitment target was met and the majority of participants (98%) were willing to be randomised. 100% retention was attained with all participants completing the SPADI at three months. Exercise adherence rates were excellent (90%). The mean change in SPADI score was -23.7 (95% CI -14.4 to -33.3) points for the self-managed exercise group and -19.0 (95% CI -6.0 to -31.9) points for the usual physiotherapy treatment group. The difference in three month SPADI scores was 0.1 (95% CI -16.6 to 16.9) points in favour of the usual physiotherapy treatment group.", 
    "In keeping with previous research which indicates the need for further evaluation of self-managed loaded exercise for rotator cuff tendinopathy, these methods and the preliminary evaluation of outcome offer a foundation and stimulus to conduct a substantive study."
], 

http://www.ncbi.nlm.nih.gov/pubmed/23954024 : Abstract OBJECTIVES: Rotator cuff tendinopathy is a common source of shoulder pain characterised by persistent and/or recurrent problems for a proportion of sufferers. The aim of this study was to pilot the methods proposed to conduct a substantive study to evaluate the effectiveness of a self-managed loaded exercise programme versus usual physiotherapy treatment for rotator cuff tendinopathy.

DESIGN:
A single-centre pragmatic unblinded parallel group pilot randomised controlled trial.

SETTING:
One private physiotherapy clinic, northern England.

PARTICIPANTS:
Twenty-four participants with rotator cuff tendinopathy.

INTERVENTIONS:
The intervention was a programme of self-managed loaded exercise. The control group received usual physiotherapy treatment.

MAIN OUTCOMES:
Baseline assessment comprised the Shoulder Pain and Disability Index (SPADI) and the Short-Form 36, repeated three months post randomisation.

RESULTS:
The recruitment target was met and the majority of participants (98%) were willing to be randomised. 100% retention was attained with all participants completing the SPADI at three months. Exercise adherence rates were excellent (90%). The mean change in SPADI score was -23.7 (95% CI -14.4 to -33.3) points for the self-managed exercise group and -19.0 (95% CI -6.0 to -31.9) points for the usual physiotherapy treatment group. The difference in three month SPADI scores was 0.1 (95% CI -16.6 to 16.9) points in favour of the usual physiotherapy treatment group.

CONCLUSIONS:
In keeping with previous research which indicates the need for further evaluation of self-managed loaded exercise for rotator cuff tendinopathy, these methods and the preliminary evaluation of outcome offer a foundation and stimulus to conduct a substantive study.

(the OBJECTIVES, DESIGN, SETTING, etc. are missing from efetch()'s abstract.)

What other metadata does efetch() misses, and is there any way to programmatically retrieve the missing information?


Solution

  • The info is not missing:

    from Bio import Entrez
    Entrez.email = "[email protected]"
    
    handle = Entrez.efetch(db="pubmed", id="23954024", rettype="xml")
    
    print(handle.read())
    

    Part of the output:

    <Abstract>
     <AbstractText Label="OBJECTIVES" NlmCategory="OBJECTIVE">Rotator cuff tendinopathy is a common source of shoulder pain characterised by persistent and/or recurrent problems for a proportion of sufferers. The aim of this study was to pilot the methods proposed to conduct a substantive study to evaluate the effectiveness of a self-managed loaded exercise programme versus usual physiotherapy treatment for rotator cuff tendinopathy.</AbstractText>
     <AbstractText Label="DESIGN" NlmCategory="METHODS">A single-centre pragmatic unblinded parallel group pilot randomised controlled trial.</AbstractText>
     <AbstractText Label="SETTING" NlmCategory="METHODS">One private physiotherapy clinic, northern England.</AbstractText>
    [...]