Search code examples
rdata-filesncbipubmeddoi

Is there a R code set to use PubMed ID or DOI to get data files for that article, please?


I am trying to get the data file names from NCBI or PubMed that are related or attached to hundreds of unique DOIs or PMIDs, in R language. For example. I have PMID: 19122651 and, I want to get the names of the three GSEs connected to it, which are: GSE12781,GSE12782, and GSE12783. I have searched various sources and packages to no avail.
Appreciate your assistance.


Solution

  • You can do this using the rentrez package.

    The required function is entrez_link.

    Example:

    library(rentrez)
    
    results <- entrez_link(dbfrom = 'pubmed', id = 19122651, db = 'gds')
    
    results$links$pubmed_gds
    [1] "200012783" "200012782" "200012781"
    

    The 3 results are the IDs for the associated GEO Dataset records. You can convert them to GSE accessions using entrez_summary.

    Here's a somewhat ugly sapply that may serve as the basis for a function:

    sapply(results$links$pubmed_gds, function (id) entrez_summary("gds", id)$accession, 
           USE.NAMES = FALSE)
    
    [1] "GSE12783" "GSE12782" "GSE12781"