Search code examples
rselenium-webdriverweb-scrapingrselenium

Web scraping data inside a tab using Rselenium


I want to scrape the summary table under Player statistics in the following page: https://www.sofascore.com/southampton-wolverhampton/dsV

I am trying to use RSelenium for this purpose

Here is my code so far:

   rm=rsDriver(browser = "chrome", chromever ="111.0.5563.64",
                verbose = F,
                port = free_port())
    
    rmDr=rm$client
    rmDr$open()
    rmDr$navigate("https://www.sofascore.com/southampton-wolverhampton/dsV")
    elem <- rmDr$findElement(using = 'xpath', '//button[@data-tabid="summary"]')

Summary data appears when I click the button summary. Hence I used xpath to extract that button as above. But it didnt work.

Could you suggest any alternative way?

Thank you.

This is the error i got:

Selenium message:no such element: Unable to locate element: {"method":"xpath","selector":"//button[@data-tabid="summary"]"}
  (Session info: chrome=111.0.5563.65)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: 'DESKTOP-MOGN5AG', ip: '192.168.0.114', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '19.0.2'
Driver info: driver.version: unknown

Error:   Summary: NoSuchElement
     Detail: An element could not be located on the page using the given search parameters.
     class: org.openqa.selenium.NoSuchElementException
     Further Details: run errorDetails method

Here is a snapshot: enter image description here


Solution

  • I clicked on the summary tab using this

    remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()
    

    Then after the page switched tabs, I pulled the page's html, and then searched for the table node. Here's the entire code:

    # load libraries
    library(RSelenium)
    library(rvest)
    library(magrittr)
    
    # define target url
    url <- "https://www.sofascore.com/southampton-wolverhampton/dsV"
    
    
    # start RSelenium ------------------------------------------------------------
    
    rD <- rsDriver(browser="firefox", port=4550L, chromever = NULL)
    remDr <- rD[["client"]]
    
    # open the remote driver-------------------------------------------------------
    remDr$open()
    
    # Navigate to webpage -----------------------------------------------------
    remDr$navigate(url)
    
    
    # click on the summary tab ------------------------------------
    remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()
    
    
    
    # pull the webpage html
    # then read it
    page_html <- remDr$getPageSource()[[1]] %>% 
      read_html() 
    
    
    
    # find table elements
    tables <- page_html %>% html_table()
    
    summary_stats_table <- tables[[1]]
    

    Here's what it looks like:

    summary_stats_table
    # A tibble: 32 × 12
       ``    `+`    Goals Assists Tackles Acc. …¹ Duels…² Groun…³ Aeria…⁴ Minut…⁵ Posit…⁶
       <lgl> <chr>  <int>   <int>   <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
     1 NA    Moham…     0       0       4 22/32 … 11 (7)  6 (4)   5 (3)   90'     D      
     2 NA    Jan B…     0       0       2 19/30 … 6 (6)   2 (2)   4 (4)   90'     D      
     3 NA    Adama…     0       0       1 11/19 … 11 (7)  11 (7)  0 (0)   45'     F      
     4 NA    Craig…     0       0       1 54/61 … 12 (7)  4 (3)   8 (4)   90'     D      
     5 NA    João …     1       0       1 8/11 (… 7 (2)   5 (2)   2 (0)   20'     M      
     6 NA    Ainsl…     0       0       4 24/36 … 10 (9)  7 (6)   3 (3)   90'     D      
     7 NA    James…     0       0       0 35/42 … 8 (4)   5 (1)   3 (3)   90'     M      
     8 NA    João …     0       0       3 10/12 … 4 (3)   4 (3)   0 (0)   45'     M      
     9 NA    Carlo…     1       0       1 22/26 … 14 (4)  13 (4)  1 (0)   79'     M      
    10 NA    Hugo …     0       0       1 19/20 … 5 (2)   3 (2)   2 (0)   45'     D      
    # … with 22 more rows, 1 more variable: Rating <dbl>, and abbreviated variable names
    #   ¹​`Acc. passes`, ²​`Duels (won)`, ³​`Ground duels (won)`, ⁴​`Aerial duels (won)`,
    #   ⁵​`Minutes played`, ⁶​Position
    # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names