Search code examples
rfor-loopweb-scrapingrvestzillow

Scraping Zillow using for loop


I am scraping Zillow and wish to scrape all the pages. I am using a for loop, as seen below. But it seems that I am receiving only the first page results.

for (page_result in 1:20) {
  zillow_url = paste0("https://www.zillow.com/orlando-fl/",page_result,"_p/?searchQueryState=%7B%22
pagination%22%3A%7B%22currentPage%22%3A",page_result,"%7D%2C%22usersSearchTerm%22%3A%22
Orlando%2C%20Fl%22%2C%22mapBounds%22%3A%7B%22west%22%3A-81.6603646328125%2C%22east%22%3A-80.8144173671875%2C%22
south%22%3A28.191492307595613%2C%22north%22%3A28.794962421299882%7D%2C%22
regionSelection%22%3A%5B%7B%22regionId%22%3A13121%2C%22
regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex
%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%7D")
}

zpg = read_html(zillow_url)

res_all <-NULL
zillow_pg <-tibble(
  addr = zpg %>% html_nodes(".list-card-addr") %>% html_text(),
  price = zpg %>% html_nodes(".list-card-price") %>% html_text(),
  details = zpg %>% html_nodes(".list-card-details") %>% html_text() ,
  heading= zpg %>% html_nodes(".list-card-info a") %>% html_text() ,
  type = zpg %>% html_nodes(".list-card-statusText") %>% html_text())


res_all <- res_all %>% bind_rows(zillow_pg)


Solution

  • You might be interested in the ZillowR package

    https://www.rdocumentation.org/packages/ZillowR/versions/0.1.0

    Zillow, an online real estate company, provides real estate and mortgage data for the United States through a REST API. The ZillowR package provides an R function for each API service, making it easy to make API calls and process the response into convenient, R-friendly data structures. See http://www.zillow.com/howto/api/APIOverview.htm for the Zillow API Documentation.

    Your code is 90% of the way there. I can't test but I think these edits will get you in the right direction:

    res_all <-NULL
    
    for (page_result in 1:20) {
      zillow_url = paste0("https://www.zillow.com/orlando-fl/",page_result,"_p/?searchQueryState=%7B%22
    pagination%22%3A%7B%22currentPage%22%3A",page_result,"%7D%2C%22usersSearchTerm%22%3A%22
    Orlando%2C%20Fl%22%2C%22mapBounds%22%3A%7B%22west%22%3A-81.6603646328125%2C%22east%22%3A-80.8144173671875%2C%22
    south%22%3A28.191492307595613%2C%22north%22%3A28.794962421299882%7D%2C%22
    regionSelection%22%3A%5B%7B%22regionId%22%3A13121%2C%22
    regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex
    %22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%7D")
    
    zpg = read_html(zillow_url)
    
    zillow_pg <-tibble(
      addr = zpg %>% html_nodes(".list-card-addr") %>% html_text(),
      price = zpg %>% html_nodes(".list-card-price") %>% html_text(),
      details = zpg %>% html_nodes(".list-card-details") %>% html_text() ,
      heading= zpg %>% html_nodes(".list-card-info a") %>% html_text() ,
      type = zpg %>% html_nodes(".list-card-statusText") %>% html_text())
    
    
    res_all <- bind_rows(res_all, zillow_pg)
    }