Search code examples
htmlrxmlweb-scrapingrvest

Web-Scraping using R (I want to extract some table like data from a website)


I'm having some problems scraping data from a website. I do have not a lot of experience with web-scraping. My intended plan is to scrape some data using R from the following website: https://www.myfxbook.com/forex-broker-swaps

More precisely, I want to extract the Forex Brokers Swap Comparison for all the available pairs.

My idea so far:

  library(XML)
  url <- paste0("https://www.myfxbook.com/forex-broker-swaps")
  source <- readLines(url, encoding = "UTF-8")
  parsed_doc <- htmlParse(source, encoding = "UTF-8")
  test<-xpathSApply(parsed_doc, path = '/html/body/div[3]/div[6]/div/div/div/div/div/div/div/div/div[3]/div[4]/div/div[2]/div', xmlValue)
  

But this doesn't bring up the intended information. Some help would be really appreciated here! Thanks!


Solution

  • How about this:

    library(dplyr)
    library(rvest)
    h <- read_html("https://www.myfxbook.com/forex-broker-swaps")
    h %>% html_table() %>% 
      purrr::pluck(3) %>% 
      setNames(paste(names(.), .[1,], sep="_")) %>% 
      rename("Broker" = "_Broker") %>% 
      filter(Broker != "Broker") %>%
      mutate(across(-Broker, as.numeric))
    #> # A tibble: 91 × 13
    #>    Broker          `EUR/USD_Short` `EUR/USD_Long` `EUR/USD_Type` `GBP/USD_Short`
    #>    <chr>                     <dbl>          <dbl>          <dbl>           <dbl>
    #>  1 Axi                        0.17          -0.56              0           -0.18
    #>  2 Tickmill                   0.24          -0.55              0           -0.22
    #>  3 Blueberry Mark…            0.31          -0.55              0           -0.17
    #>  4 Eightcap                   0.31          -0.55              0           -0.17
    #>  5 Rakuten Securi…            0.19          -0.5               0            0   
    #>  6 ACY Securities            -0.34          -3.75              3           -1.28
    #>  7 AAAFx                      1.98          -6.42              1           -2.07
    #>  8 MultiBank Group            0.3           -0.66              0           -0.12
    #>  9 Just2Trade                 0.12          -0.9               0           -0.26
    #> 10 Fusion Markets             0.31          -0.55              0           -0.15
    #> # … with 81 more rows, and 8 more variables: `GBP/USD_Long` <dbl>,
    #> #   `GBP/USD_Type` <dbl>, `USD/CAD_Short` <dbl>, `USD/CAD_Long` <dbl>,
    #> #   `USD/CAD_Type` <dbl>, `USD/JPY_Short` <dbl>, `USD/JPY_Long` <dbl>,
    #> #   `USD/JPY_Type` <dbl>
    

    Created on 2022-05-25 by the reprex package (v2.0.1)