Search code examples
rweb-scrapingrvesthttr

Web Scraping Earnings Calendar


  url <- "https://finance.yahoo.com/calendar/earnings?from=2022-12-04&to=2022-12-10&day=2022-12-06"

download_table <- function(url) {
  url_file <- GET(url)
  web_page_parsed <- htmlParse(url_file)
  tables <- readHTMLTable(web_page_parsed)
}

url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
print(head(tables))

I used this one for yahoo and it worked. But I tried this for:

url <- "https://www.benzinga.com/calendars/earnings"

download_table <- function(url) {
  url_file <- GET(url)
  web_page_parsed <- htmlParse(url_file)
  tables <- readHTMLTable(web_page_parsed)
}

url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
print(head(tables))
tables$`NULL`

And I got no tables as result but this:

> print(head(tables))
$`NULL`
  Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
  Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert

$`NULL`
  V1   V2   V3   V4   V5   V6   V7   V8   V9  V10  V11  V12  V13
1                                                               
2    <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>

> tables$`NULL`
  Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
  Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
> 

If i search in the source code for example the tickers I cant find them. So I cant use the rvest package to scrap them.

Has anyone a idea how to do this with benzinga?

Thank you and KR

Web Scraping Bezinga Earnings Calender with rvest and httpr


Solution

  • The data is pulled from an API that you can see in the network section (inspect element in the developer tools).

    The link is as follows:

    https://api.benzinga.com/api/v2.1/calendar/earnings?token=1c2735820e984715bc4081264135cb90&parameters[date_from]=2023-01-25&parameters[date_to]=2023-01-25&parameters[tickers]=&pagesize=1000

    You can then create a function that alter the dates and filter for the tickers ([tickers]) of interest. I wrote one here as a suggestion with httr2 where the function takes from_date and to_date as input.

    library(tidyverse)
    library(httr2)
    
    get_earnings <- function(from_date, to_date) {
      str_c(
        "https://api.benzinga.com/api/v2.1/calendar/earnings?token=1c2735820e984715bc4081264135cb90&parameters[date_from]=",
        from_date,
        "&parameters[date_to]=",
        to_date,
        "&parameters[tickers]=&pagesize=1000"
      ) %>%
        request() %>%
        req_headers(accept = "application/json") %>%
        req_perform() %>%
        resp_body_json(simplifyVector = TRUE) %>%
        pluck("earnings") %>%
        as_tibble() %>%
        type_convert()
    }
    
    get_earnings(from_date = "2023-01-01", to_date = "2023-01-25")
    
    # A tibble: 387 × 25
       currency date       date_confirmed   eps eps_est eps_prior eps_surprise eps_surprise_per…
       <chr>    <date>              <int> <dbl>   <dbl>     <dbl>        <dbl>             <dbl>
     1 USD      2023-01-25              1  0.91    0.58      0.57         0.33            0.569 
     2 USD      2023-01-25              1 NA       1.27      1.42        NA              NA     
     3 USD      2023-01-25              1  1       0.97      0.92         0.03            0.0309
     4 USD      2023-01-25              1  1.01    1.13      0.95        -0.12           -0.106 
     5 USD      2023-01-25              1  0.69   NA         0.93        NA              NA     
     6 USD      2023-01-25              1  0.12    0.13      0.16        -0.01           -0.0769
     7 USD      2023-01-25              1  1.5     1.43      1.05         0.07            0.049 
     8 USD      2023-01-25              1  1.1     0.98      0.69         0.12            0.122 
     9 USD      2023-01-25              1  0.02    0.01     -0.65         0.01            1     
    10 USD      2023-01-25              1  0.42    0.44      0.5         -0.02           -0.0455
    # … with 377 more rows, and 17 more variables: eps_type <chr>, exchange <chr>, id <chr>,
    #   importance <int>, name <chr>, notes <chr>, period <chr>, period_year <int>,
    #   revenue <dbl>, revenue_est <dbl>, revenue_prior <dbl>, revenue_surprise <dbl>,
    #   revenue_surprise_percent <dbl>, revenue_type <chr>, ticker <chr>, time <time>,
    #   updated <int>