Search code examples
rweb-scrapingrvest

Web scraping of nested links with R


I would like to web scrap the links that are nested in the name of the property, this script works, however, not retrieves the URLs only NAs. Could you help me or what I am missing in the script snipped.

Thank you

# Test
library(rvest)
library(dplyr)

link <- "https://www.sreality.cz/hledani/prodej/byty/brno?_escaped_fragment_="
page <- read_html(link)

price <- page %>% 
  html_elements(".norm-price.ng-binding") %>% 
  html_text()

name <- page %>% 
  html_elements(".name.ng-binding") %>% 
  html_text()

location <- page %>% 
  html_elements(".locality.ng-binding") %>% 
  html_text()

href <- page %>% 
  html_nodes(".name.ng-binding") %>% 
  html_attr("href") %>% paste("https://www.sreality.cz", ., sep="")

flat <- data.frame(price, name, location, href, stringsAsFactors = FALSE)


Solution

  • Your CSS selector picked the anchors' inline html instead of the anchor. This should work:

     page %>% 
         html_nodes("a.title") %>%
         html_attr("ng-href") %>% 
         paste0("https://www.sreality.cz", .)
    

    paste0(...) being a shorthand for paste(..., sep = '')