Search code examples
rrvest

How to retrieve titles from google search using rvest


I am working on a web scraping project using rvest. I have found useful posts about the task I am conducting but I am not getting the expected output. Basically, I want to get the names from titles after a search is done in google. For that I use next code (based on this post):

Web Scraping Google Result with R

library(rvest)
library(tidyverse)
#Code
#url
url <- 'https://www.google.com/search?q=Mario+Torres+Mexico'
#Get data
first_page <- read_html(url)
titles <- html_nodes(first_page, xpath = "//div/div/div/a/div[not(div)]") %>% 
  html_text()

Which works and returns this:

titles
 [1] "www.facebook.com › Pages › Public figure › Artist"     
 [2] "mx.linkedin.com › mario-torres-84ab9b1b"               
 [3] "mx.linkedin.com › ingmariotorres"                      
 [4] "sic.cultura.gob.mx › ficha"                            
 [5] "www.meer.com › authors › 826-mario-torres-dujisin"     
 [6] "www.transfermarkt.es › mario-torres › profil › spieler"
 [7] "www.espn.com.ec › mma › peleador › mario-torres"       
 [8] "twitter.com › matorresr"                               
 [9] "es.wikipedia.org › wiki › Jaime_Torres_Bodet"          
[10] "www.instagram.com › mario_torres25"  

But, I do not know if it is possible to extract the names below each web link. Graphically, these (only highlighted the two first as example, but it should be all the ten titles similar to previous output):

enter image description here

Is that possible, many thanks!

Edit: Is it possible to extract the text framed in red?

enter image description here


Solution

  • Google searches change according to locale and also over time, so the list I get is different from yours. However, the xpath should be the same:

    html_nodes(first_page, xpath = "//div/div/div/a/h3") %>% html_text()
    #> [1] "Mario García Torres - Wikipedia"                              
    #> [2] "Mario Torres (@mario_torres25) • Instagram photos and videos" 
    #> [3] "Mario Torres - Regional manager Mexico and Central America"   
    #> [4] "Mario Lopez Torres - A Furniture And Art Experience"          
    #> [5] "Mario García Torres | The Guggenheim Museums and Foundation"  
    #> [6] "Mario Torres - Player profile | Transfermarkt"                
    #> [7] "Mario Torres Lopez - 33 For Sale on 1stDibs - 1stDibs"        
    #> [8] "Mario Lopez Torres - 12 For Sale at 1stdibs"                  
    #> [9] "Mario Lopez Torres Furniture | On the Town, Hispanic Heritage"