I am new to the web scraping topic with R and Rvest. With rvest you can scrape static HTML but I have found out that rvest struggeling to scrape data from heavy JS based Sites.
I found some articels or blog posts but they seems depricated like https://awesomeopensource.com/project/yusuzech/r-web-scraping-cheat-sheet
In my case i want scrape odds from Sport Betting Sites but with rvest and SelectorGadget this isnt possible in my Opinion because of the JS.
There is an Articel from 2018 about scraping Odds from PaddyPower(https://www.r-bloggers.com/how-to-scrape-data-from-a-javascript-website-with-r/) but this is out dated too, because PhantomJS isnt available anymore. RSelenium seems to be an option but the repo has many issues https://github.com/ropensci/RSelenium.
So is it possible to work with RSelenium in its current state or what options do I have instead of RSelenium?
kind regards
I've had no problems using RSelenium
with the help of the wdman
package, which allowed me to just not bother with Docker. wdman
also fetches all binaries you need if they aren't already available. It's nice magic.
Here's a simple script to spin up a Selenium instance with Chrome, open a site, get the contents as xml and then close it all down again.
library(wdman)
library(RSelenium)
library(xml2)
# start a selenium server with wdman, running the latest chrome version
selServ <- wdman::selenium(
port = 4444L,
version = 'latest',
chromever = 'latest'
)
# start your chrome Driver on the selenium server
remDr <- remoteDriver(
remoteServerAddr = 'localhost',
port = 4444L,
browserName = 'chrome'
)
# open a selenium browser tab
remDr$open()
# navigate to your site
remDr$navigate(some_url)
# get the html contents of that site as xml tree
page_xml <- xml2::read_html(remDr$getPageSource()[[1]])
# do your magic
# ... check doc at `?remoteDriver` to see what your remDr object can help you do.
# clean up after you
remDr$close()
selServ$stop()