I'm trying to pull down the JSON file from the NASDAQ request URL in R but I've been unsuccessful. I've been able to craft a solution in Python but I'm really interested in finding a solution in R, if possible.
I've reached out to the NASDAQ helpdesk and accessing the data via the request API URL is not something they support.
I've seen several solutions in Stack Overflow but none of them appear to work with my particular situation. Here is the simplest attempt and the result I received:
library(jsonlite)
data <- fromJSON('https://api.nasdaq.com/api/calendar/dividends?date=2023-10-09')
Error in open.connection(con, "rb") : cannot open the connection
In addition: Warning message:
In open.connection(con, "rb") :
InternetOpenUrl failed: 'The operation timed out'
I don't believe the above attempt is the correct approach, so I'm hoping someone sees this and has a solution that will work.
My expected result is the JSON file would be extracted from the request API URL (https://api.nasdaq.com/api/calendar/dividends?date=2023-10-09) and converted into an R dataframe.
Indeed, the site is (intentionally?) hanging on some queries based on a header.
library(httr)
url <- 'api.nasdaq.com/api/calendar/dividends?date=2023-10-09'
response <- GET(url, add_headers(
`User-Agent`= "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0"
,`Accept-Language`= "en-US,en;q=0.5"
))
status_code(response)
# [1] 200
str(content(response))
# List of 3
# $ data :List of 2
# ..$ calendar :List of 3
# .. ..$ asOf : chr "Mon, Oct 9, 2023"
# .. ..$ headers:List of 8
# .. .. ..$ symbol : chr "Symbol"
# .. .. ..$ companyName : chr "Name"
# .. .. ..$ dividend_Ex_Date : chr "Ex-Dividend Date"
# .. .. ..$ payment_Date : chr "Payment Date"
# .. .. ..$ record_Date : chr "Record Date"
# .. .. ..$ dividend_Rate : chr "Dividend"
# .. .. ..$ indicated_Annual_Dividend: chr "Indicated Annual Dividend"
# .. .. ..$ announcement_Date : chr "Announcement Date"
# .. ..$ rows :List of 2
# .. .. ..$ :List of 8
# .. .. .. ..$ companyName : chr "Brady Corporation Common Stock"
# .. .. .. ..$ symbol : chr "BRC"
# .. .. .. ..$ dividend_Ex_Date : chr "10/09/2023"
# .. .. .. ..$ payment_Date : chr "10/31/2023"
# .. .. .. ..$ record_Date : chr "10/10/2023"
# .. .. .. ..$ dividend_Rate : num 0.235
# .. .. .. ..$ indicated_Annual_Dividend: num 0.94
# .. .. .. ..$ announcement_Date : chr "9/04/2023"
# .. .. ..$ :List of 8
# .. .. .. ..$ companyName : chr "Saba Capital Income & Opportunities Fund SBI"
# .. .. .. ..$ symbol : chr "BRW"
# .. .. .. ..$ dividend_Ex_Date : chr "10/09/2023"
# .. .. .. ..$ payment_Date : chr "10/31/2023"
# .. .. .. ..$ record_Date : chr "10/10/2023"
# .. .. .. ..$ dividend_Rate : num 0.085
# .. .. .. ..$ indicated_Annual_Dividend: num 1.02
# .. .. .. ..$ announcement_Date : chr "9/29/2023"
# ..$ timeframe:List of 2
# .. ..$ minDate: chr "1922-05-22T00:00:00"
# .. ..$ maxDate: chr "2104-06-02T00:00:00"
# $ message: NULL
# $ status :List of 3
# ..$ rCode : int 200
# ..$ bCodeMessage : NULL
# ..$ developerMessage: NULL
I show the results from my headers, though in the end yours worked as well.
In my Firefox browser, I sent to that URL and it immediately worked. Opening up the browser devtools (F12
), going to the Network tab, right-clicking on the request row (that should show a 200) and selecting Copy Value >> Copy Request Headers, I got this:
GET /api/calendar/dividends?date=2023-10-09 HTTP/2
Host: api.nasdaq.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: cross-site
Pragma: no-cache
Cache-Control: no-cache
(There was also a Cookie:
header, but I removed it in the unlikely event it means something personal for me/my-browser *shrug*.)
I converted these into an R-friendly vector of string arguments with
hdrs <- clipr::read_clip()
hdrs |>
grep("^(GET|Host:) ", x = _, value = TRUE, invert = TRUE) |>
sub("^([^:]+):", ", `\\1` =", x = _) |>
paste(collapse = "\n") |>
sub("^, ", "", x = _) |>
cat("\n")
# `User-Agent` = Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0
# , `Accept` = text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
# , `Accept-Language` = en-US,en;q=0.5
# , `Accept-Encoding` = gzip, deflate, br
# , `Connection` = keep-alive
# , `Upgrade-Insecure-Requests` = 1
# , `Sec-Fetch-Dest` = document
# , `Sec-Fetch-Mode` = navigate
# , `Sec-Fetch-Site` = cross-site
# , `Pragma` = no-cache
# , `Cache-Control` = no-cache
(my real console output doesn't have the #
prepended), which I then copy/paste into a template expression that was
response <- GET(url, add_headers(
###
))
status_code(response)
overwriting the ###
with the cat
'ed headers. I ran it and immediately got a 200
, meaning it succeeded. I then removed all options and tried again, it hung (no surprise). I added a few at a time until I got a 200
again, ultimately coming up with the two headers that made a difference.
Ultimately I found what works (and I think your commented code is mostly the same).
I'll use dplyr
here since it tends to work with list-columns better than base.
library(dplyr)
quux <- content(response)
tibble(asOf = quux$data$calendar$asOf) %>%
reframe(asOf, rows = lapply(quux$data$calendar$rows, as.data.frame)) %>%
tidyr::unnest(rows)
# # A tibble: 2 × 9
# asOf companyName symbol dividend_Ex_Date payment_Date record_Date dividend_Rate indicated_Annual_Dividend announcement_Date
# <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
# 1 Mon, Oct 9, 2023 Brady Corporation Common Stock BRC 10/09/2023 10/31/2023 10/10/2023 0.235 0.94 9/04/2023
# 2 Mon, Oct 9, 2023 Saba Capital Income & Opportunities Fund SBI BRW 10/09/2023 10/31/2023 10/10/2023 0.085 1.02 9/29/2023
I think the quux$data$headers
is just a way to map the names found in rows
with more-human-readable of the same, so it didn't add much. Similarly, minDate
and maxDate
may be nice for you to know, but perhaps not as important in the resulting frame.
There's likely a better way to frame-ize it, but this works for now and might be good-enough for getting at the data.