I have a list of html_nodes , and I want to check all whether they exist in a page and return 1 if yes, and 0 if it does not exits.
I have tried "if" function for each node manually, but as they may change over time, I needed to scrape all available nodes from the entire website and check each node on each page.
What I have
data<-foreach(i=urls) %dopar% {
node1 <- read_html(i) %>% html_nodes(xpath = node1) %>% html_text()
if (length(node1)>0){
node1<-1
} else{
node1<-0
}
node2 <- read_html(i) %>% html_nodes(xpath = node2) %>% html_text()
if (length(node1)>0){
node2<-1
} else{
node2<-0
}
}
I need something similar to this (intuition):
data<-foreach(i=urls) %dopar% {
for (j in nodes) {
node <- read_html(i) %>% html_nodes(xpath = j) %>% html_text()
if (length(node)>0){
node<-1
} else{
node<-0
}
}
}
You were almost there, you need indeed a loop for your nodes. sapply
and co is your friend:
data <- foreach(i=urls) %dopar% {
sapply(nodes, function(j)
length(read_html(i) %>% html_nodes(xpath = j) %>% html_text()) > 0)
}