I am trying to parse xmlValue
of certain child nodes from NCBI xml file. But, for some PM.IDs, the Root node <PubmedArticleSet>
has different information w.r.t pubmed records, PubmedBookArticle
and PubmedArticle
. I would like to pass a condition, if(xmlName(fetch.pubmed) == PubmedBookArticle
extract certain valueselseif (xmlName(fetch.pubmed) == PubmedArticle
extract other values. Finally, make a dataframe
with both the values corresponding to their PMIDs. It seems simple, but (xmlName(fetch.pubmed)
throws error no applicable method for 'xmlName' applied to an object of class "c('XMLInternalDocument', 'XMLAbstractDocument')"
Any help is appreciated, thank you
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2015//EN" "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_150101.dtd">
<PubmedArticleSet>
<PubmedBookArticle>
<BookDocument>
<PMID Version="1">25506969</PMID>
<ArticleIdList>
<ArticleId IdType="bookaccession">NBK259188</ArticleId>
</ArticleIdList> ....
...... </BookDocument>
</PubmedBookArticle>
<PubmedArticle>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">25013473</PMID>
<DateCreated>
<Year>2014</Year>
<Month>7</Month>
<Day>11</Day>
</DateCreated>....
....</MedlineCitation>
</PubmedArticle>
</PubmedArticleSet>
My code is below
library(XML)
library(rentrez)
PM.ID <- c("25506969"," 25032371"," 24983039","24983034","24983032","24983031",
"26386083","26273372","26066373","25837167",
"25466451","25013473")
# rentrez function to retrieve XMl file for above PIMD
fetch.pubmed <- entrez_fetch(db = "pubmed", id = PM.ID,
rettype = "xml", parsed = T)
# If empty records, return NA
FindNull <- function(x,x1child){
res <- xpathSApply(x,x1child,xmlValue)
if (length(res) == 0){
out <- NA
}else {
out <- res
}
out
}
# extract contents from xml file
xpathSApply(fetch.pubmed,"//PubmedArticle",FindNull,x1child = './/ArticleTitle')
xpathSApply(fetch.pubmed,"//PubmedBookArticle",FindNull,x1child = './/BookTitle')
How do I get above code in a loop, so that I can retrieve values within PubmedArticle and PubmedBookArticle as an when the condition is met in each search ?
There are a few ways you could do this, but I would maybe get separate node sets for books and articles.
table( xpathSApply(fetch.pubmed, "/PubmedArticleSet/*", xmlName) )
PubmedArticle PubmedBookArticle
6 6
books <- getNodeSet(fetch.pubmed, "/PubmedArticleSet/PubmedBookArticle")
data.frame( pmid = sapply(books, function(x) xpathSApply(x, ".//PMID", xmlValue)),
title = sapply(books, function(x) xpathSApply(x, ".//BookTitle", xmlValue))
)
pmid title
1 25506969 Probe Reports from the NIH Molecular Libraries Program
2 25032371 Understanding Climate’s Influence on Human Evolution
3 24983039 Assessing the Effects of the Gulf of Mexico Oil Spill on Human Health: A Summary of the June 2010 Workshop
4 24983034 In the Light of Evolution: Volume IV: The Human Condition
5 24983032 The Role of Human Factors in Home Health Care: Workshop Summary