Search code examples
rrvest

How to retrieve hyperlinks in google search using rvest


I am using rvest to get the hyperlinks in a Google search. User @AllanCameron helped me in the past to sketch this code but now I do not know how to change the xpath or what I need to do in order to get the links. Here my code:

library(rvest)
library(tidyverse)
#Code
#url
url <- 'https://www.google.com/search?q=Mario+Torres+Mexico'
#Get data
first_page <- read_html(url)
links <- html_nodes(first_page, xpath = "//div/div/a/h3") %>% 
  html_attr('href')

Which entirely returns NA.

I would like to get the links for each item that appears like next (sorry for the quality of images):

enter image description here

enter image description here

Is possible to get that stored in a dataframe? Many thanks!


Solution

  • Look at the parents a of the h3 nodes and find their href attribute. This ensures you have the same number of links as the main titles, to allow for easy arrangement in a dataframe.

    titles <- html_nodes(first_page, xpath = "//div/div/a/h3")
    
    titles %>%
      html_elements(xpath = "./parent::a") %>%
      html_attr("href") %>%
      str_extract("https.*?(?=&)")
    
    [1] "https://www.linkedin.com/in/mario-torres-b5796315b"                                                           
    [2] "https://mariolopeztorres.com/"                                                                                
    [3] "https://www.instagram.com/mario_torres25/%3Fhl%3Den"                                                          
    [4] "https://www.1stdibs.com/buy/mario-torres-lopez/"                                                              
    [5] "https://m.facebook.com/2064681987175832"                                                                      
    [6] "https://www.facebook.com/mariotorresmx"                                                                       
    [7] "https://www.transfermarkt.us/mario-torres/profil/spieler/28167"                                               
    [8] "https://en.wikipedia.org/wiki/Mario_Garc%25C3%25ADa_Torres"                                                   
    [9] "https://circawho.com/press-and-magazines/mario-lopez-torress-legacy-is-still-being-woven-in-michoacan-mexico/"