I've been scraping data from an API using R with the httr
and plyr
libraries. Its pretty straight forward and works well with the following code:
library(httr)
library(plyr)
headers <- c("Accept" = "application/json, text/javascript",
"Accept-Encoding" = "gzip, deflate, sdch",
"Connection" = "keep-alive",
"Referer" = "http://www.afl.com.au/stat",
"Host" = "www.afl.com.au",
"User-Agent" = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36",
"X-Requested-With"= "XMLHttpRequest",
"X-media-mis-token" = "f31fcfedacc75b1f1b07d5a08887f078")
query <- GET("http://www.afl.com.au/api/cfs/afl/season?seasonId=CD_S2016014", add_headers(headers))
stats <- httr::content(query)
My question is with regards to the request token required in the headers (i.e. X-media-mis-token). This is easy to get manually by inspecting the XHR elements in Chrome or Firefox, but the token is updated every 24 hrs making manual extraction a pain.
Is it possible to query the web page and extract this token automatically using R?
You can get the X-media-mis-token
token, but with a disclaimer. ;)
library(httr)
token_url <- 'http://www.afl.com.au/api/cfs/afl/WMCTok'
token <- POST(token_url, encode="json")
content(token)$token
#[1] "f31fcfedacc75b1f1b07d5a08887f078"
content(token)$disclaimer
#[1] "All content and material contained within this site is protected by copyright owned by or licensed to Telstra. Unauthorised reproduction, publishing, transmission, distribution, copying or other use is prohibited.