Search code examples
rloopspubmed

Obtain PubMed info from PMIDs using RefManageR - in a loop


I am trying to retrieve citation info from PubMed -using RefManageR and PubMed IDs (pmids)-.

I chose RefManageR because it's really easy to paste the output in a data.frame format. And for me is still difficult to understand and use the PubMed API on my own.

I was able to write code that uses a "string of PMIds" as input to get the data:

require(RCurl)
urli <- getURL("https://gist.githubusercontent.com/aurora-mareviv/3840512f6777d5293218/raw/dfd6b76ceb22c52aa073fc05211dcea986406914/pmids.csv", ssl.verifypeer = FALSE)
pmids <- read.csv(textConnection(urli))
head(pmids)
index10 <- pmids$pmId[1:10]
indice10 <- paste(pmids$pmId[1:10], collapse=" ")

# install.packages("RefManageR")
library(RefManageR)
auth.pm10 <- ReadPubMed(indice10, database = "PubMed", mindate = 1950)
auth.pm10d <- data.frame(auth.pm10)
View(auth.pm10)

However, if I want to get citations from 500 pmids, I think I should avoid a long query in the PubMed server. My idea would be to make a function that loops through all the elements in the vector index10, similar to this:

extract.pub <-
  function(id=indice, dbase=d.base, mindat=1950){
    require(RefManageR)
    indice <- id # Author
    d.base <- dbase # like PubMed, etc
    min.dat <- mindat # Date from...
    auth.pm <- NULL
    for(i in indice){
      auth.pm <-  ReadPubMed(indice, database = d.base, mindate = min.dat)
      }
    auth.pm <- data.frame(auth.pm)
    auth.pm
   }

cites <- extract.pub(index10, dbase="PubMed")
View(cites)

It gives the following error: Error : Internal server error.

However, if I insert indice10 (string) instead of index10 (vector), it works:

cites <- extract.pub(indice10, dbase="PubMed")
View(cites)

¿How could I make this loop work? or maybe this approach is not the best for my purposes?


Solution

  • ReadPubMEd only accepts one pmid or query per function call. Try:

    lapply(pmids[1:3], ReadPubMed, database = "PubMed", mindate = 1950)
    

    gives

    [[1]]
    [1] P. M. Zeltzer, B. Bodey, A. Marlin, et al. “Immunophenotype profile of childhood
    medulloblastomas and supratentorial primitive neuroectodermal tumors using 16 monoclonal
    antibodies”. Eng. In: _Cancer_ 66.2 (1990), pp. 273-83. PMID: 2196109.
    
    [[2]]
    [1] L. C. Rome, R. P. Funke and R. M. Alexander. “The influence of temperature on muscle
    velocity and sustained performance in swimming carp”. Eng. In: _The Journal of
    experimental biology_ 154 (1990), pp. 163-78. PMID: 2277258.
    
    [[3]]
    [1] P. Henry. “[Headache, facial neuralgia. Diagnostic orientation and management]”. Fre.
    In: _La Revue du praticien_ 40.7 (1990), pp. 677-81. PMID: 2326596.
    

    You can put elements of the BibEntry class into a data.frame and format authoring nicely

    lapply(pmids[1:3], function(x){
     tmp <- unlist(ReadPubMed(x, database = "PubMed", mindate = 1950))
     tmp <- lapply(tmp, function(z) if(is(z, "person")) paste0(z, collapse = ",") else z)
     data.frame(tmp, stringsAsFactors = FALSE)
    })
    

    gives

                                                                                                                                        title
    1 Immunophenotype profile of childhood medulloblastomas and supratentorial primitive neuroectodermal tumors using 16 monoclonal antibodies
    2                                               The influence of temperature on muscle velocity and sustained performance in swimming carp
    3                                                                      [Headache, facial neuralgia. Diagnostic orientation and management]
                                       author year                             journal volume number  pages  eprint language eprinttype bibtype
    1 P M Zeltzer,B Bodey,A Marlin,J Kemshead 1990                              Cancer     66      2 273-83 2196109      eng     pubmed Article
    2        L C Rome,R P Funke,R M Alexander 1990 The Journal of experimental biology    154   <NA> 163-78 2277258      eng     pubmed Article
    3                                 P Henry 1990               La Revue du praticien     40      7 677-81 2326596      fre     pubmed Article
         dateobj                        key
    1 1990-01-01 zeltzer1990immunophenotype
    2 1990-01-01          rome1990influence
    3 1990-01-01          henry1990headache