download javascript-loading data from the web using R? tricky web scraping

I've been trying to do some web scraping using R, and in several pages it has been relatively easy. But I've been struggling for weeks with one particular web page:

https://www.commerzbank.de/de/hauptnavigation/kunden/kursinfo/devisenk/weitere_waehrungen___indikative_kurse/indikative_kurse.jsp

The problem, I think, lies in the fact that in the end, the page loads the data using javascript.

At first I thought it was a very simple case; after all, it is just a link that you put in the browser to see the data, so I thought ok, it is a good-old http get request and I naively tried something like this:

library(httr)
url <- "https://www.commerzbank.de/de/hauptnavigation/kunden/kursinfo/devisenk/weitere_waehrungen___indikative_kurse/indikative_kurse.jsp"
res1 <- GET(url = url)

As it didn't work, I checked how the web page works and it is as follows. First, it sets some cookies and a couple of parameters and then redirects the browser (by means of a http POST request) to the url https://www.commerzbank.de/rates/do.rates. This new page loads a huge javascript code (1923 lines of code, as formatted by http://jsbeautifier.org/) that is responsible for downloading the data and generating the html code to display it. This code uses the cookies and parameters set by the original page to determine what data to download and display.

I've tried too many things in R to get the data in this web page. I won't put in here all the crazy stuff I tried because it would be too long (and sometimes embarrassing), but I have tried playing with most functions of RCurl and other packages (repmis, scrapeR, httr, rjson, among others). Nothing seems to work because none of these packages seem to have a way to (at least automatically) make the javascript code to run to download the data.

Is there any package/hidden function that would help me accomplish this?

Thanks in advance.

Solution

Assuming that you want to scrape the data of the table in the middle of the page, here is a solution using RSelenium.

library(RSelenium)
library(magrittr)

base_url = "https://www.commerzbank.de/de/hauptnavigation/kunden/kursinfo/devisenk/weitere_waehrungen___indikative_kurse/indikative_kurse.jsp"

checkForServer()
startServer()
remDrv <- remoteDriver()
remDrv$open()

remDrv$navigate(base_url)

remDrv$getPageSource()[[1]] %>% htmlParse %>% 
readHTMLTable(header = TRUE) %>% 
extract2(1) %>% head

# ISO                             Land Mittelkurs     Geld    Brief
# 1 AFN                      Afghanistan    66,6600  65,6600  67,6600
# 2 ALL                         Albanien   140,2300 137,7300 142,7300
# 3 AMD                         Armenien   553,6000 523,6000 583,6000
# 4 ANG Curaçao, St. Martin (südl. Teil)     2,0392   1,9892   2,0892
# 5 AOA                           Angola   119,7755 116,7755 122,7755
# 6 ARS                      Argentinien     9,9598   9,8798  10,0398

RSelenium even supports headless browsing leveraging PhantomJS as described in this vignette.