Search code examples
rweb-scrapingrvesthttr

Extract data for all the instances from a dynamic website


I'm trying to scrape the data on all gym locations from https://www.xercise4less.co.uk/find-a-gym/.

In Developer Tools I found a pointer to the Web API URL that should store this information under https://www.xercise4less.co.uk/Umbraco/Api/FindAGymApi/GetAll but when I run it in the browser I get

The 'ObjectContent`1' type failed to serialize the response body for content type 'text/xml; charset=utf-8'

Similarly, if I run the following code:

# user_agent argument is optional here and results are the same whether I include it or not 
httr::GET('https://www.xercise4less.co.uk/Umbraco/Api/FindAGymApi/GetAll',  httr::user_agent("httr"))

Any ideas on how to go about this?

Alternatively, I can (almost) access all the gym IDs by

library(rvest)
library(magrittr)

url <- "https://www.xercise4less.co.uk/find-a-gym/"
my_pg <- read_html(url) 
my_pg %>% html_nodes('select > option')

But then I'm still not sure about how to iterate over all the IDs in order to get the complete list of coordinates/locations. Thanks for any pointers.


Solution

  • You are pretty much there you just need to set the right request header expected by server then you get all the info for all the gyms.

    library(httr)
    
    headers = c('Accept'='application/json, text/javascript, */*; q=0.01')
    r <- content(httr::GET(url = 'https://www.xercise4less.co.uk/Umbraco/Api/FindAGymApi/GetAll', httr::add_headers(.headers=headers)))
    print(r)