I want to scrape the match time and date from this url:
http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary
By using the chrome dev tools, I can see this appears to be generated using the following code:
<td colspan="3" id="utime" class="mstat-date">01:20 AM, October 29, 2014</td>
But this is not in the source html.
I think this is because its java (correct me if Im wrong). How can I scrape this information using R?
So, RSelenium is not the only answer (anymore). If you can install the PhantomJS binary (grab phantomjs binaries from here: http://phantomjs.org/) then you can use it to render the HTML and scrape it with rvest
(similar to the RSelenium approach but doesn't require java):
library(rvest)
# render HTML from the site with phantomjs
url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary"
writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
console.log(page.content); //page source
phantom.exit();
});", url), con="scrape.js")
system("phantomjs scrape.js > scrape.html", intern = T)
# extract the content you need
pg <- html("scrape.html")
pg %>% html_nodes("#utime") %>% html_text()
## [1] "10:20 AM, October 28, 2014"