Search code examples
javascriptrweb-scraping

scraping xml/javascript table with R


I want to scrape a table like this http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC/ I'd want to scrape the bookmakers and the odds. The problem is I don't know what kind of a table that is nor how to scrape it.

These threads might be able to help me (Scraping javascript with R or What type of HTML table is this and what type of webscraping techniques can you use?) but I'd appreciate if someone could point me in the right direction or better yet give instructions here.

So what kind of a table is that odds table, is it possible to scrape it with R and if so, how?

Edit: I should have been more clear. I have scraped data with R for some time now and probably dont need help with basics. After further inspection that table is indeed Javascript and that is the problem and what I need help with


Solution

  • You can use Selenium and RSelenium to get the relevant data:

    library(RSelenium)
    appURL <- "http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC"
    RSelenium::startServer()
    remDr <- remoteDriver()
    remDr$open()
    remDr$navigate(appURL)
    tblSource <- remDr$executeScript("return tbls[0].outerHTML;")[[1]]
    readHTMLTable(tblSource)
    > readHTMLTable(tblSource)
    $`NULL`
    Bookmakers    1    X    2 Payout 
    1    bet-at-home  2.25 3.80 2.60  91.6% 
    2        Â bet365Â Â 2.29 3.79 2.64  92.7% 
    3        Betsson  2.35 3.75 2.65  93.5% 
    4           bwin  2.30 3.75 2.70  93.3% 
    5    MarathonBet  2.35 3.80 2.78  95.4% 
    6       Titanbet  2.30 3.95 2.50  91.9% 
    7        TonyBet  2.35 3.70 2.70  93.8% 
    8         Unibet  2.35 3.85 2.60  93.5% 
    9   William Hill  2.30 3.90 2.50  91.6% 
    10        Winner  2.30 3.95 2.50  91.9% 
    11        youwin  2.40 3.75 2.55  93.0%