I am unable to download or read a zip file from an API request using httr package. Is there another package I can try that will allow me to download/read binary zip files stored within the response of a get request in R?
I tried two ways:
used GET to get an application/json type response object (successful) and then used fromJSON to extract content using content(my_response, 'text'). The output includes a column called 'zip' which is the data I'm interested in downloading, which documentation states is a base64 encoded binary file. This column is currently a really long string of random letters and I'm not sure how to convert this to the actual dataset.
I tried bypassing using fromJSON because I noticed there is a field of class 'raw' within the response object itself. This object is a list of random numbers which I suspect are the binary representation of the dataset. I tried using rawToChar(my_response$content) to try to convert the raw data type into character, but this results in the same long character string being produced as in #1.
getzip1 <- GET(getzip1_link)
getzip1 # successful response, status 200
df <- fromJSON(content(getzip1, "text"))
df$status # "OK"
df$dataset$zip # <- this is the very long string of letters (eg. "I1NC5qc29uUEsBAhQDFA...")
# Method 1: try to convert from the 'zip' object in the output of fromJSON
try1 <- base64_dec(df$dataset$zip)
#looks similar to getzip1$content (i.e. this produces the list of numbers/letters 50 4b 03 04 14 00, etc, perhaps binary representation)
# Method 2: try to get data directly from raw object
class(getzip1$content) # <- 'raw' class object directly from GET request
try2 <- rawToChar(getzip1$content) #returns same output as df$data$zip
I should be able to use either the raw 'content' object from my response or the long character string in the 'zip' object of the output of fromJSON in order to view the dataset or somehow download it. I don't know how to do this. Please help!
welcome!
Based on the documentation for the API the response to the getDataset
endpoint has schema
Dataset archive including meta information, the dataset itself is base64 encoded to allow for binary ZIP transfers.
{
"status": "OK",
"dataset": {
"state_id": 5,
"session_id": 1624,
"session_name": "2019-2020 Regular Session",
"dataset_hash": "1c7d77fe298a4d30ad763733ab2f8c84",
"dataset_date": "2018-12-23",
"dataset_size": 317775,
"mime": "application\/zip",
"zip": "MIME 64 Encoded Document"
}
}
We can use R for obtaining the data with the following code,
library(httr)
library(jsonlite)
library(stringr)
library(maditr)
token <- "" # Your API key
session_id <- 1253L # Obtained from the getDatasetList endpoint
access_key <- "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile <- file.path("path", "to", "file.zip") # Modify
response <- str_c("https://api.legiscan.com/?key=",
token,
"&op=getDataset&id=",
session_id,
"&access_key=",
access_key) %>%
GET()
status_code(x = response) == 200 # Good
body <- content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() # This contains some extra metadata
content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() %>%
getElement(name = "dataset") %>%
getElement(name = "zip") %>%
base64_dec() %>%
writeBin(con = destfile)
unzip(zipfile = destfile)
unzip
will unzip the files which in this case will look like
hash.md5 # Can be checked against the metadata
AL/2016-2016_1st_Special_Session/bill/*.json
AL/2016-2016_1st_Special_Session/people/*.json
AL/2016-2016_1st_Special_Session/vote/*.json
As always, wrap your code in functions and profit.
PS: Here is how the code would like like in Julia as a comparison.
using Base64, HTTP, JSON3, CodecZlib
token = "" # Your API key
session_id = 1253 # Obtained from the getDatasetList endpoint
access_key = "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile = joinpath("path", "to", "file.zip") # Modify
response = string("https://api.legiscan.com/?",
join(["key=$token",
"op=getDataset",
"id=$session_id",
"access_key=$access_key"],
"&")) |>
HTTP.get
@assert response.status == 200
JSON3.read(response.body) |>
(content -> content.dataset.zip) |>
base64decode |>
(data -> write(destfile, data))
run(pipeline(`unzip`, destfile))