I want to scrape the summary table under Player statistics in the following page: https://www.sofascore.com/southampton-wolverhampton/dsV
I am trying to use RSelenium for this purpose
Here is my code so far:
rm=rsDriver(browser = "chrome", chromever ="111.0.5563.64",
verbose = F,
port = free_port())
rmDr=rm$client
rmDr$open()
rmDr$navigate("https://www.sofascore.com/southampton-wolverhampton/dsV")
elem <- rmDr$findElement(using = 'xpath', '//button[@data-tabid="summary"]')
Summary data appears when I click the button summary. Hence I used xpath to extract that button as above. But it didnt work.
Could you suggest any alternative way?
Thank you.
This is the error i got:
Selenium message:no such element: Unable to locate element: {"method":"xpath","selector":"//button[@data-tabid="summary"]"}
(Session info: chrome=111.0.5563.65)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: 'DESKTOP-MOGN5AG', ip: '192.168.0.114', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '19.0.2'
Driver info: driver.version: unknown
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
I clicked on the summary tab using this
remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()
Then after the page switched tabs, I pulled the page's html, and then searched for the table node. Here's the entire code:
# load libraries
library(RSelenium)
library(rvest)
library(magrittr)
# define target url
url <- "https://www.sofascore.com/southampton-wolverhampton/dsV"
# start RSelenium ------------------------------------------------------------
rD <- rsDriver(browser="firefox", port=4550L, chromever = NULL)
remDr <- rD[["client"]]
# open the remote driver-------------------------------------------------------
remDr$open()
# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)
# click on the summary tab ------------------------------------
remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()
# pull the webpage html
# then read it
page_html <- remDr$getPageSource()[[1]] %>%
read_html()
# find table elements
tables <- page_html %>% html_table()
summary_stats_table <- tables[[1]]
Here's what it looks like:
summary_stats_table
# A tibble: 32 × 12
`` `+` Goals Assists Tackles Acc. …¹ Duels…² Groun…³ Aeria…⁴ Minut…⁵ Posit…⁶
<lgl> <chr> <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 NA Moham… 0 0 4 22/32 … 11 (7) 6 (4) 5 (3) 90' D
2 NA Jan B… 0 0 2 19/30 … 6 (6) 2 (2) 4 (4) 90' D
3 NA Adama… 0 0 1 11/19 … 11 (7) 11 (7) 0 (0) 45' F
4 NA Craig… 0 0 1 54/61 … 12 (7) 4 (3) 8 (4) 90' D
5 NA João … 1 0 1 8/11 (… 7 (2) 5 (2) 2 (0) 20' M
6 NA Ainsl… 0 0 4 24/36 … 10 (9) 7 (6) 3 (3) 90' D
7 NA James… 0 0 0 35/42 … 8 (4) 5 (1) 3 (3) 90' M
8 NA João … 0 0 3 10/12 … 4 (3) 4 (3) 0 (0) 45' M
9 NA Carlo… 1 0 1 22/26 … 14 (4) 13 (4) 1 (0) 79' M
10 NA Hugo … 0 0 1 19/20 … 5 (2) 3 (2) 2 (0) 45' D
# … with 22 more rows, 1 more variable: Rating <dbl>, and abbreviated variable names
# ¹`Acc. passes`, ²`Duels (won)`, ³`Ground duels (won)`, ⁴`Aerial duels (won)`,
# ⁵`Minutes played`, ⁶Position
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names