Search code examples
rwebweb-scrapingrvestxml2

Using rvest, xml2 and selector gadget for webscraping results in xml_missing <NA>


I'm trying to scrape information from the following URL:

https://www.google.com/search?q=812-800%20H%20St%20NW

I want to retrieve the highlighted "812 H St NW": [target][1]

The selector gadget (chrome extension) suggests to use the following node ".desktop-title-content"

However, I get an NA as a result and I don't get how to fix this problem.

Here is my code:

link <- "https://www.google.com/search?q=812-800%20H%20St%20NW"
xml2::read_html(link) %>% 
  rvest::html_node(".desktop-title-content") %>%  rvest::html_text()

[1] NA

Thank you [1]: https://i.sstatic.net/mzY75.png


Solution

  • It looks like the content that I want to get is generated by javascript. Therefore, I need to create a .js file and access it using phantom JS as per this tutorial: https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r

    Then, I will be able to use rvest to scrape the correct content.

    Unfortunately, I need to do this for around 2000 different links. I will be looking for a solution to automatically create 2000 ".js" files.

    Thanks for your answers.