Search code examples
rweb-scrapinghtml-table

need a help for scraping table data from this website


I want to scraping table data from this website by using R language.

my code

library(XML)
url <- "https://www.westmetall.com/en/markdaten.php?action=show_table&field=LME_Cu_cash"
doc <- htmlParse(url)
tableNodes = getNodeSet(doc,"//table")
tb = readHTMLTable(tableNodes[[1]])

but i got a error looks like that enter image description here


Solution

  • You can do it using {rvest} package

    library(rvest)
    
    url <- "https://www.westmetall.com/en/markdaten.php?action=show_table&field=LME_Cu_cash"
    
    tables <- read_html(url) |>
      html_table()
    

    The tables list contains all the tables found in the page you can inspect it

    str(tables)
    #> List of 5
    #>  $ : tibble [7 × 4] (S3: tbl_df/tbl/data.frame)
    #>   ..$ Official LME-Prices in US Dollar: chr [1:7] "in US Dollar per ton" "Copper" "Tin" "Lead" ...
    #>   ..$ 07. October 2022                : chr [1:7] "Settlement Kasse" "7,575.50" "20,000.00" "2,078.00" ...
    #>   ..$                                 : chr [1:7] "3 months" "7,554.00" "19,950.00" "2,050.00" ...
    #>   ..$                                 : chr [1:7] "Chart\nTable\nAverage" "" "" "" ...
    #>  $ : tibble [7 × 4] (S3: tbl_df/tbl/data.frame)
    #>   ..$ LME stocks      : chr [1:7] "in tons" "Copper" "Tin" "Lead" ...
    #>   ..$ 07. October 2022: chr [1:7] "" "143,775" "4,690" "31,875" ...
    #>   ..$ Changes         : chr [1:7] "" "3,575" "15" "0" ...
    #>   ..$                 : chr [1:7] "Chart\nTable\nAverage" "" "" "" ...
    #>  $ : tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
    #>   ..$ Exchange Rates  : chr [1:3] "EUR/USD LME-FX-rate (MTLE)" "ECB-Fixing (14:15 Uhr)" "EUR/USD-Basis DEL-Notiz"
    #>   ..$ 07. October 2022: num [1:3] 0.979 0.98 0.979
    #>   ..$ 06. October 2022: num [1:3] 0.987 0.986 0.987
    #>   ..$                 : logi [1:3] NA NA NA
    #>  $ : tibble [15 × 4] (S3: tbl_df/tbl/data.frame)
    #>   ..$ German Metal Prices: chr [1:15] "in Euro per 100 kg" "lower Copper WM-Notiz" "higher Copper WM-Notiz" "lower DEL-Notiz (until February 11, 2022)" ...
    #>   ..$ 07. October 2022   : chr [1:15] "" "786.54" "789.89" "-" ...
    #>   ..$ 06. October 2022   : chr [1:15] "" "797.52" "800.84" "-" ...
    #>   ..$                    : chr [1:15] "Chart\nTable\nAverage" "" "" "" ...
    #>  $ : tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
    #>   ..$ Precious metals : chr [1:5] "Gold London Fixing in USD/oz." "Gold in Euro/kg" "Gold, processed in Euro/kg" "Fine Silver in Euro/kg" ...
    #>   ..$ 07. October 2022: chr [1:5] "1,711.50" "55,190.00" "62,080.00" "666.90 / 733.80" ...
    #>   ..$ 06. October 2022: chr [1:5] "1,716.00" "54,840.00" "61,670.00" "658.90 / 725.10" ...
    #>   ..$                 : logi [1:5] NA NA NA NA NA
    

    Then you just have to pick the table you want and format as you wish

    tables[[2]]
    #> # A tibble: 7 × 4
    #>   `LME stocks` `07. October 2022` Changes  ``                     
    #>   <chr>        <chr>              <chr>    <chr>                  
    #> 1 in tons      ""                 ""       "Chart\nTable\nAverage"
    #> 2 Copper       "143,775"          "3,575"  ""                     
    #> 3 Tin          "4,690"            "15"     ""                     
    #> 4 Lead         "31,875"           "0"      ""                     
    #> 5 Zinc         "53,475"           "150"    ""                     
    #> 6 Aluminium    "327,625"          "-1,225" ""                     
    #> 7 Nickel       "52,362"           "942"    ""
    

    Created on 2022-10-09 with reprex v2.0.2