url <- "https://finance.yahoo.com/calendar/earnings?from=2022-12-04&to=2022-12-10&day=2022-12-06"
download_table <- function(url) {
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
I used this one for yahoo and it worked. But I tried this for:
url <- "https://www.benzinga.com/calendars/earnings"
download_table <- function(url) {
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
And I got no tables as result but this:
> print(head(tables))
Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
> tables$`NULL`
Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
If i search in the source code for example the tickers I cant find them. So I cant use the rvest package to scrap them.
Has anyone a idea how to do this with benzinga?
Thank you and KR
Web Scraping Bezinga Earnings Calender with rvest and httpr
The data is pulled from an API that you can see in the network section (inspect element in the developer tools).
The link is as follows:
You can then create a function that alter the dates and filter for the tickers ([tickers]
) of interest. I wrote one here as a suggestion with httr2
where the function takes from_date
and to_date
as input.
get_earnings <- function(from_date, to_date) {
) %>%
request() %>%
req_headers(accept = "application/json") %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("earnings") %>%
as_tibble() %>%
get_earnings(from_date = "2023-01-01", to_date = "2023-01-25")
# A tibble: 387 × 25
currency date date_confirmed eps eps_est eps_prior eps_surprise eps_surprise_per…
<chr> <date> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 USD 2023-01-25 1 0.91 0.58 0.57 0.33 0.569
2 USD 2023-01-25 1 NA 1.27 1.42 NA NA
3 USD 2023-01-25 1 1 0.97 0.92 0.03 0.0309
4 USD 2023-01-25 1 1.01 1.13 0.95 -0.12 -0.106
5 USD 2023-01-25 1 0.69 NA 0.93 NA NA
6 USD 2023-01-25 1 0.12 0.13 0.16 -0.01 -0.0769
7 USD 2023-01-25 1 1.5 1.43 1.05 0.07 0.049
8 USD 2023-01-25 1 1.1 0.98 0.69 0.12 0.122
9 USD 2023-01-25 1 0.02 0.01 -0.65 0.01 1
10 USD 2023-01-25 1 0.42 0.44 0.5 -0.02 -0.0455
# … with 377 more rows, and 17 more variables: eps_type <chr>, exchange <chr>, id <chr>,
# importance <int>, name <chr>, notes <chr>, period <chr>, period_year <int>,
# revenue <dbl>, revenue_est <dbl>, revenue_prior <dbl>, revenue_surprise <dbl>,
# revenue_surprise_percent <dbl>, revenue_type <chr>, ticker <chr>, time <time>,
# updated <int>