I'm attempting to webscrape the following url to obtain live game data: https://egamersworld.com/callofduty/matches I've attempted to inspect the fetch requests being made, but there isn't an obvious request that's returning json formatted data with the page info.
Additionally, I'm getting an error 403 forbidden response when attempting to access the site using R. I've also attempted replicating the request headers, still no luck.
I'm no web scraping professional, and I'm curious if this website has some additional steps I need to be performing. Or if they have measures in place that i'm unaware of.
Here is my R code that I've attempted. I've attempted many different headers and header combinations (all of which result in 403).
Note: I modified the Accept-Encoding header from "gzip, deflate, br, zstd" as having "br, zstd" present gives the error:
"Error in curl::curl_fetch_memory(url, handle = handle) : Unrecognized content encoding type. libcurl understands deflate, gzip content encodings."
library(httr)
url <- "https://egamersworld.com/callofduty/matches"
headers = add_headers("Accept" = "*/*",
"Accept-Encoding" = "gzip, deflate",
"Accept-Language" = "en-US,en;q=0.9",
"Referer" = "https://egamersworld.com/matches",
"User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36")
response = GET(url, headers)
response$status_code
# returns 403
Here is an rvest
approach alongside selenider
as browser, where we pull data directly from the html:
library(selenider)
library(rvest)
session <- selenider_session("selenium", browser = "chrome")
open_url("https://egamersworld.com/callofduty/matches")
elements <- session |> get_page_source() |> html_elements(".item_teams__cKXQT")
res <- data.frame(
home_team_name = elements |>
html_elements(".item_team__evhUQ:nth-child(1) .item_teamName__NSnfH") |>
html_text(trim = TRUE),
home_team_odds = elements |>
html_elements(".item_team__evhUQ:nth-child(1) .item_odd__Lm2Wl") |>
html_text(trim = TRUE),
away_team_name = elements |>
html_elements(".item_team__evhUQ:nth-child(3) .item_teamName__NSnfH") |>
html_text(trim = TRUE),
away_team_odds = elements |>
html_elements(".item_team__evhUQ:nth-child(3) .item_odd__Lm2Wl") |>
html_text(trim = TRUE),
match_date = elements |>
html_elements(".item_scores__Vi7YX .item_date__g4cq_") |>
html_text(trim = TRUE),
match_time = elements |>
html_elements(".item_scores__Vi7YX .item_time__xBia_") |>
html_text(trim = TRUE),
match_type = elements |>
html_elements(".item_scores__Vi7YX .item_bo__u2C9Q") |>
html_text(trim = TRUE)
)
giving the 20 results available on your page
home_team_name | home_team_odds | away_team_name | away_team_odds | match_date | match_time | match_type |
---|---|---|---|---|---|---|
Noctem Esports | 1.8 | Project 7 Esports | 1.8 | 05.03.25 | 20:00 | Bo5 |
Annex Esports | 1.8 | Team Notorious | 1.8 | 05.03.25 | 20:00 | Bo5 |
DIZWRLD | 1.8 | AVNG Esports | 1.8 | 05.03.25 | 20:00 | Bo5 |
Inglorious Gaming | 1.8 | Notorious Gaming | 1.8 | 05.03.25 | 22:00 | Bo5 |
Clutch Rayn Esport | 1.8 | Katana Gaming | 1.8 | 05.03.25 | 22:00 | Bo5 |
Rauzan Esport | 1.8 | Team Bance | 1.8 | 05.03.25 | 22:00 | Bo5 |
OMiT Brooklyn | 1.8 | Team WaR | 1.8 | 06.03.25 | 00:30 | Bo5 |
YFP | 1.8 | Kansas City Pioneers | 1.8 | 06.03.25 | 00:30 | Bo5 |
6F Carolina | 1.8 | Pinnacle | 1.8 | 06.03.25 | 00:30 | Bo5 |
OMiT Brooklyn | 1.8 | Destro Gaming | 1.8 | 06.03.25 | 02:00 | Bo5 |
Luxury Exotics | 2.627 | CABAL Gaming | 1.454 | 06.03.25 | 02:00 | Bo5 |
Royal Spartans | 2 | Lore Gaming | 1.727 | 06.03.25 | 02:00 | Bo5 |
Vancouver Surge | 2.87 | Los Angeles Thieves | 1.41 | 07.03.25 | 21:00 | Bo5 |
Toronto Ultra | 1.03 | Vegas Falcons | 9.07 | 07.03.25 | 22:30 | Bo5 |
Cloud9 New York | 1.8 | Los Angeles Guerrillas M8 | 1.8 | 08.03.25 | 00:00 | Bo5 |
Atlanta FaZe | 1.23 | Minnesota RØKKR | 3.82 | 08.03.25 | 21:00 | Bo5 |
Cloud9 New York | 1.89 | Carolina Royal Ravens | 1.79 | 08.03.25 | 22:30 | Bo5 |
Los Angeles Thieves | 1.06 | Boston Breach | 7 | 09.03.25 | 00:00 | Bo5 |
OpTic Texas | 1.85 | Miami Heretics | 1.85 | 09.03.25 | 01:30 | Bo5 |
Vegas Falcons | 1.8 | Los Angeles Guerrillas M8 | 1.8 | 09.03.25 | 21:00 | Bo5 |
In case any of these objects are renamed, you can also try the following
library(selenider)
session <- selenider_session("selenium", browser = "chrome")
open_url("https://egamersworld.com/callofduty/matches")
elements <- session |>
find_elements(".item_teams__cKXQT") |>
as.list()
res <- do.call(rbind, lapply(elements, function(x) {
matrix(strsplit(elem_text(x), "\n")[[1]], nrow = 1)
})) |> as.data.frame()
Using Browser Tools (F12) you only need the name of
If you want, you can fetch realtime match data using the Websocket like described here
library(websocket)
library(jsonlite)
ws <- websocket::WebSocket$new("wss://ws.egamersworld.com/socket.io/?EIO=3&transport=websocket", autoConnect = FALSE)
all_messages <- list()
ws$onOpen(function(event) { cat("Connection opened\n")})
ws$onError(function(event) { cat("Error occurred:\n") ; print(event) })
ws$onMessage(function(event) {
cat("Message received\n")
all_messages <<- c(all_messages, list(event$data))
cat("Messages collected:", length(all_messages), "\n")
})
ws$onClose(function(event) {
cat("Connection closed. Saving data to json...\n")
output_file <- paste0("egamersworld_data_", format(Sys.time(), "%Y%m%d_%H%M%S"), ".json")
writeLines(toJSON(all_messages, auto_unbox = TRUE), output_file)
cat("All messages saved to:", output_file, "\n")
})
ws$connect() # Connect and listen
ws$close() # close