I am trying to scrape information from google scholar web page:
https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:materials_science
library(rvest)
htmlfile<-"https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:materials_science"
g_interest<- read_html(htmlfile) %>% html_nodes("div.gsc_oai_int") %>% html_text()
I got the following result:
[1] "Quantum Chemistry Electronic Structure Condensed Matter Physics Materials Science Nanotechnology "
[2] "density functional theory first principles calculations many body theory condensed matter physics materials science "
[3] "chemistry materials science physics nanotechnology "
[4] "Materials Science Nanotechnology Chemistry Physics "
[5] "Physics Theoretical Physics Condensed Matter Theory Materials Science Nanoscience "
[6] "Materials Science Quantum Chemistry Fiber Optic Sensors Geophysics "
[7] "Chemical Physics Condensed Matter Materials Science Magnetic Properties NMR "
[8] "Materials Science "
[9] "Materials Science Physics "
[10] "Physics Materials Science Theoretical Physics Nanoscience "
However, I would like to get the results like:
[1]"Quantum Chemistry; Electronic Structure;Condensed Matter Physics; Materials Science; Nanotechnology "
......
Any suggestions how to separate the results with ";"?
You can make use of purrr
and stringr
packages, extract all nodes first and concatenate individual ones.
library(rvest)
library(purrr)
library(stringr)
htmlfile<-"https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:materials_science"
content_nodes<- read_html(htmlfile) %>% html_nodes("div.gsc_oai_int")
map_chr(content_nodes,~.x %>%
html_nodes(".gsc_oai_one_int") %>%
html_text() %>%
str_c(collapse = ";"))
result:
[1] "Quantum Chemistry;Electronic Structure;Condensed Matter Physics;Materials Science;Nanotechnology"
[2] "density functional theory;first principles calculations;many body theory;condensed matter physics;materials science"
[3] "chemistry;materials science;physics;nanotechnology"
[4] "Materials Science;Nanotechnology;Chemistry;Physics"
[5] "Physics;Theoretical Physics;Condensed Matter Theory;Materials Science;Nanoscience"
[6] "Materials Science;Quantum Chemistry;Fiber Optic Sensors;Geophysics"
[7] "Chemical Physics;Condensed Matter;Materials Science;Magnetic Properties;NMR"
[8] "Materials Science"
[9] "Materials Science;Physics"
[10] "Physics;Materials Science;Theoretical Physics;Nanoscience"