I am trying to get the data file names from NCBI or PubMed that are related or attached to hundreds of unique DOIs or PMIDs, in R language. For example. I have PMID: 19122651 and, I want to get the names of the three GSEs connected to it, which are: GSE12781,GSE12782, and GSE12783.
I have searched various sources and packages to no avail.
Appreciate your assistance.
You can do this using the rentrez package.
The required function is entrez_link.
Example:
library(rentrez)
results <- entrez_link(dbfrom = 'pubmed', id = 19122651, db = 'gds')
results$links$pubmed_gds
[1] "200012783" "200012782" "200012781"
The 3 results are the IDs for the associated GEO Dataset records. You can convert them to GSE accessions using entrez_summary
.
Here's a somewhat ugly sapply
that may serve as the basis for a function:
sapply(results$links$pubmed_gds, function (id) entrez_summary("gds", id)$accession,
USE.NAMES = FALSE)
[1] "GSE12783" "GSE12782" "GSE12781"