Using R and httr to retrieve data - Okta API GET request with paginated link headers

I am trying to use the RStudio / Hadley Wickham 'httr' R package to return all records from an Okta API GET request ('List Users Assigned to Application'). The following request works perfectly fine to get the maximum limit of records (500) per call:

oktaurl <- "https://mydomain.okta.com/api/v1/apps/applicationID/users?limit=500"

oktagetjson <- with_verbose(content(GET(oktaurl,
                                        add_headers("Authorization" = "bearer myapikey",
                                                       "Content-Type" =  "application/json;charset=UTF-8"))))

Parsing the 'oktagetjson' returned data into a usable data frame with 'jsonlite' and R is not a problem; however, this particular API call is hard limited to a maximum of 500 records per call so I need to somehow retrieve and paginate through all the 'Link:' headers to get all several thousand records. The 'Link:' headers themselves are in the form of:

Link: <https://mydomain.okta.com/api/v1/apps/applicationID/users?limit=500>; rel="self"

Link: <https://mydomain.okta.com/api/v1/apps/applicationID/users?after=random cursor string&limit=500>; rel="next"

(The Okta API documentation describes their pagination structure here)

I am stuck here:

I can see the first two pagination 'Link:' headers listed above in the R / RStudio console when making the call 'oktagetjson <- with_verbose(content(GET(oktaurl, etc ... ) ...) ' to get my oktagetjston object but the 'Link:' headers are not returned as part of the object itself. Calling headers(HEAD("https://mydomain.okta.com/api/v1/apps/<applicationID>/users")) returns some headers but does not return the pagination 'Link:' headers
The 'Link:' headers contain random cursor strings so I cannot guess their actual format
Even if I could retrieve all of the required 'Link:' headers, I have no idea how to call / iterate / paginate / recursively follow through all of them in R to build an object of the entire data set of several thousand records.

Unfortunately, due to the nature of the request, service provider and data I cannot provide a fully reproducible example with real links and sample data but I hope the concept is clear enough for someone to point me in the right direction - even if that direction is to not use the 'httr' package or R for this effort.

Thank you for your consideration.

Solution

Hacked something together a while ago that works but certainly won't win any elegance awards. Have modified it to get users assigned to Okta applications as well. Useful if you are auditing / joining with other company / directory data.

library(jsonlite)
library(dplyr)
library(httr)
library(purrr)
library(stringi)
library(tidyr)

# create character vector to hold URLs we'll use later when we GET content
url_list <- as.character()

# list placeholder for GET content
okta_content <- list()

# initial URL construction parts for first URL
okta_urllimit = as.character("200")
okta_baseurl <- paste0("https://<your company>.okta.com/api/v1/users?limit=",okta_urllimit)

# next URL construction parts for 'next' URLs
basenexturl <- "https://<your company>.okta.com/api/v1/users?after="
baselimiturl <- "&limit=200"

# Pass initial URL to get first batch
okta_get01 <- httr::GET(okta_baseurl,
                         config = (
                           add_headers(Authorization = "SSWS <your Okta API key>")))


# append the URL vector 
url_list <- append(url_list, okta_baseurl)


# unlist the all_headers list element from the URL
testallheaders <- as.character(unlist(okta_get01$all_headers))

okta_content <- append(okta_content,content(okta_get01))

# if "next" is in the second link URL (testallheaders[16]) then iterate for as long as
# the next URL header element has "next" in it

while (
  grepl("next",testallheaders[16]) == 'TRUE'
)

{ 
  # parse the sha value 
  testparsenext <- regmatches(testallheaders[16], gregexpr('(?<=after=).*?(?=&limit)',testallheaders[16], perl=T))[[1]]
  # and create URL
  oktaurlnext <- paste0(basenexturl,testparsenext,baselimiturl)


  # iterate and replace 'okta_baseurl' with each subsquent oktaurlnext

  okta_get01 <- httr::GET(oktaurlnext,
                           config = (
                             add_headers(Authorization = "SSWS <your Okta API key>")))

  testallheaders <- as.character(unlist(okta_get01$all_headers))
  url_list <- append(url_list, oktaurlnext)
  okta_content <- append(okta_content,content(okta_get01))

  next
}


# Parse the results into something usable

oktagettojson <- toJSON(okta_content, simplifyDataFrame = TRUE, flatten = TRUE, recursive = TRUE)
oktagetdf <- fromJSON(oktagettojson, simplifyDataFrame = TRUE, flatten = TRUE)
dfnames <- names(oktagetdf)
oktagetdf <- oktagetdf %>% map_if(is.list, as.character)
oktagetdf <- do.call(cbind, lapply(oktagetdf, data.frame, stringsAsFactors=FALSE))
names(oktagetdf) <- dfnames

# adding columns to separate AD domain mastered account and domain names
oktagetdf <- separate(oktagetdf, profile.login,
                      into = c("credPrefix", "credSuffix"), sep = "@", remove = FALSE, extra = "drop")

# select some data frame columns of interest
okta_allusers <- subset(oktagetdf, select = c("id","status","created","lastLogin","profile.login","credPrefix", "credSuffix","profile.firstName","profile.lastName","profile.email","credentials.provider.type","credentials.provider.name"))