Search code examples
rjsonurlrequest

Download API JSON file in R and convert to dataframe


I'm trying to pull down the JSON file from the NASDAQ request URL in R but I've been unsuccessful. I've been able to craft a solution in Python but I'm really interested in finding a solution in R, if possible.

I've reached out to the NASDAQ helpdesk and accessing the data via the request API URL is not something they support.

I've seen several solutions in Stack Overflow but none of them appear to work with my particular situation. Here is the simplest attempt and the result I received:

library(jsonlite)
data <- fromJSON('https://api.nasdaq.com/api/calendar/dividends?date=2023-10-09')

Error in open.connection(con, "rb") : cannot open the connection
In addition: Warning message:
In open.connection(con, "rb") :
  InternetOpenUrl failed: 'The operation timed out'

I don't believe the above attempt is the correct approach, so I'm hoping someone sees this and has a solution that will work.

My expected result is the JSON file would be extracted from the request API URL (https://api.nasdaq.com/api/calendar/dividends?date=2023-10-09) and converted into an R dataframe.


Solution

  • Indeed, the site is (intentionally?) hanging on some queries based on a header.

    TL;DR

    library(httr)
    url <- 'api.nasdaq.com/api/calendar/dividends?date=2023-10-09'
    response <- GET(url, add_headers(
      `User-Agent`= "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0"
      ,`Accept-Language`= "en-US,en;q=0.5"
    ))
    status_code(response)
    # [1] 200
    str(content(response))
    # List of 3
    #  $ data   :List of 2
    #   ..$ calendar :List of 3
    #   .. ..$ asOf   : chr "Mon, Oct 9, 2023"
    #   .. ..$ headers:List of 8
    #   .. .. ..$ symbol                   : chr "Symbol"
    #   .. .. ..$ companyName              : chr "Name"
    #   .. .. ..$ dividend_Ex_Date         : chr "Ex-Dividend Date"
    #   .. .. ..$ payment_Date             : chr "Payment Date"
    #   .. .. ..$ record_Date              : chr "Record Date"
    #   .. .. ..$ dividend_Rate            : chr "Dividend"
    #   .. .. ..$ indicated_Annual_Dividend: chr "Indicated Annual Dividend"
    #   .. .. ..$ announcement_Date        : chr "Announcement Date"
    #   .. ..$ rows   :List of 2
    #   .. .. ..$ :List of 8
    #   .. .. .. ..$ companyName              : chr "Brady Corporation Common Stock"
    #   .. .. .. ..$ symbol                   : chr "BRC"
    #   .. .. .. ..$ dividend_Ex_Date         : chr "10/09/2023"
    #   .. .. .. ..$ payment_Date             : chr "10/31/2023"
    #   .. .. .. ..$ record_Date              : chr "10/10/2023"
    #   .. .. .. ..$ dividend_Rate            : num 0.235
    #   .. .. .. ..$ indicated_Annual_Dividend: num 0.94
    #   .. .. .. ..$ announcement_Date        : chr "9/04/2023"
    #   .. .. ..$ :List of 8
    #   .. .. .. ..$ companyName              : chr "Saba Capital Income & Opportunities Fund SBI"
    #   .. .. .. ..$ symbol                   : chr "BRW"
    #   .. .. .. ..$ dividend_Ex_Date         : chr "10/09/2023"
    #   .. .. .. ..$ payment_Date             : chr "10/31/2023"
    #   .. .. .. ..$ record_Date              : chr "10/10/2023"
    #   .. .. .. ..$ dividend_Rate            : num 0.085
    #   .. .. .. ..$ indicated_Annual_Dividend: num 1.02
    #   .. .. .. ..$ announcement_Date        : chr "9/29/2023"
    #   ..$ timeframe:List of 2
    #   .. ..$ minDate: chr "1922-05-22T00:00:00"
    #   .. ..$ maxDate: chr "2104-06-02T00:00:00"
    #  $ message: NULL
    #  $ status :List of 3
    #   ..$ rCode           : int 200
    #   ..$ bCodeMessage    : NULL
    #   ..$ developerMessage: NULL
    

    I show the results from my headers, though in the end yours worked as well.

    How I Found This

    In my Firefox browser, I sent to that URL and it immediately worked. Opening up the browser devtools (F12), going to the Network tab, right-clicking on the request row (that should show a 200) and selecting Copy Value >> Copy Request Headers, I got this:

    GET /api/calendar/dividends?date=2023-10-09 HTTP/2
    Host: api.nasdaq.com
    User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
    Accept-Language: en-US,en;q=0.5
    Accept-Encoding: gzip, deflate, br
    Connection: keep-alive
    Upgrade-Insecure-Requests: 1
    Sec-Fetch-Dest: document
    Sec-Fetch-Mode: navigate
    Sec-Fetch-Site: cross-site
    Pragma: no-cache
    Cache-Control: no-cache
    

    (There was also a Cookie: header, but I removed it in the unlikely event it means something personal for me/my-browser *shrug*.)

    I converted these into an R-friendly vector of string arguments with

    hdrs <- clipr::read_clip()
    hdrs |>
      grep("^(GET|Host:) ", x = _, value = TRUE, invert = TRUE) |>
      sub("^([^:]+):", ", `\\1` =", x = _) |>
      paste(collapse = "\n") |>
      sub("^, ", "", x = _) |>
      cat("\n")
    # `User-Agent` = Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0
    # , `Accept` = text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
    # , `Accept-Language` = en-US,en;q=0.5
    # , `Accept-Encoding` = gzip, deflate, br
    # , `Connection` = keep-alive
    # , `Upgrade-Insecure-Requests` = 1
    # , `Sec-Fetch-Dest` = document
    # , `Sec-Fetch-Mode` = navigate
    # , `Sec-Fetch-Site` = cross-site
    # , `Pragma` = no-cache
    # , `Cache-Control` = no-cache 
    

    (my real console output doesn't have the # prepended), which I then copy/paste into a template expression that was

    response <- GET(url, add_headers(
      ###
    ))
    status_code(response)
    

    overwriting the ### with the cat'ed headers. I ran it and immediately got a 200, meaning it succeeded. I then removed all options and tried again, it hung (no surprise). I added a few at a time until I got a 200 again, ultimately coming up with the two headers that made a difference.

    Ultimately I found what works (and I think your commented code is mostly the same).

    data.frame

    I'll use dplyr here since it tends to work with list-columns better than base.

    library(dplyr)
    quux <- content(response)
    tibble(asOf = quux$data$calendar$asOf) %>%
      reframe(asOf, rows = lapply(quux$data$calendar$rows, as.data.frame)) %>%
      tidyr::unnest(rows)
    # # A tibble: 2 × 9
    #   asOf             companyName                                  symbol dividend_Ex_Date payment_Date record_Date dividend_Rate indicated_Annual_Dividend announcement_Date
    #   <chr>            <chr>                                        <chr>  <chr>            <chr>        <chr>               <dbl>                     <dbl> <chr>            
    # 1 Mon, Oct 9, 2023 Brady Corporation Common Stock               BRC    10/09/2023       10/31/2023   10/10/2023          0.235                      0.94 9/04/2023        
    # 2 Mon, Oct 9, 2023 Saba Capital Income & Opportunities Fund SBI BRW    10/09/2023       10/31/2023   10/10/2023          0.085                      1.02 9/29/2023        
    

    I think the quux$data$headers is just a way to map the names found in rows with more-human-readable of the same, so it didn't add much. Similarly, minDate and maxDate may be nice for you to know, but perhaps not as important in the resulting frame.

    There's likely a better way to frame-ize it, but this works for now and might be good-enough for getting at the data.