I am trying to use the RStudio / Hadley Wickham 'httr' R package to return all records from an Okta API GET request ('List Users Assigned to Application'). The following request works perfectly fine to get the maximum limit of records (500) per call:
oktaurl <- "https://mydomain.okta.com/api/v1/apps/applicationID/users?limit=500"
oktagetjson <- with_verbose(content(GET(oktaurl,
add_headers("Authorization" = "bearer myapikey",
"Content-Type" = "application/json;charset=UTF-8"))))
Parsing the 'oktagetjson' returned data into a usable data frame with 'jsonlite' and R is not a problem; however, this particular API call is hard limited to a maximum of 500 records per call so I need to somehow retrieve and paginate through all the 'Link:' headers to get all several thousand records. The 'Link:' headers themselves are in the form of:
Link: <https://mydomain.okta.com/api/v1/apps/applicationID/users?limit=500>; rel="self"
Link: <https://mydomain.okta.com/api/v1/apps/applicationID/users?after=random cursor string&limit=500>; rel="next"
(The Okta API documentation describes their pagination structure here)
I am stuck here:
headers(HEAD("https://mydomain.okta.com/api/v1/apps/<applicationID>/users"))
returns some headers but does not return the pagination 'Link:' headersUnfortunately, due to the nature of the request, service provider and data I cannot provide a fully reproducible example with real links and sample data but I hope the concept is clear enough for someone to point me in the right direction - even if that direction is to not use the 'httr' package or R for this effort.
Thank you for your consideration.
Hacked something together a while ago that works but certainly won't win any elegance awards. Have modified it to get users assigned to Okta applications as well. Useful if you are auditing / joining with other company / directory data.
library(jsonlite)
library(dplyr)
library(httr)
library(purrr)
library(stringi)
library(tidyr)
# create character vector to hold URLs we'll use later when we GET content
url_list <- as.character()
# list placeholder for GET content
okta_content <- list()
# initial URL construction parts for first URL
okta_urllimit = as.character("200")
okta_baseurl <- paste0("https://<your company>.okta.com/api/v1/users?limit=",okta_urllimit)
# next URL construction parts for 'next' URLs
basenexturl <- "https://<your company>.okta.com/api/v1/users?after="
baselimiturl <- "&limit=200"
# Pass initial URL to get first batch
okta_get01 <- httr::GET(okta_baseurl,
config = (
add_headers(Authorization = "SSWS <your Okta API key>")))
# append the URL vector
url_list <- append(url_list, okta_baseurl)
# unlist the all_headers list element from the URL
testallheaders <- as.character(unlist(okta_get01$all_headers))
okta_content <- append(okta_content,content(okta_get01))
# if "next" is in the second link URL (testallheaders[16]) then iterate for as long as
# the next URL header element has "next" in it
while (
grepl("next",testallheaders[16]) == 'TRUE'
)
{
# parse the sha value
testparsenext <- regmatches(testallheaders[16], gregexpr('(?<=after=).*?(?=&limit)',testallheaders[16], perl=T))[[1]]
# and create URL
oktaurlnext <- paste0(basenexturl,testparsenext,baselimiturl)
# iterate and replace 'okta_baseurl' with each subsquent oktaurlnext
okta_get01 <- httr::GET(oktaurlnext,
config = (
add_headers(Authorization = "SSWS <your Okta API key>")))
testallheaders <- as.character(unlist(okta_get01$all_headers))
url_list <- append(url_list, oktaurlnext)
okta_content <- append(okta_content,content(okta_get01))
next
}
# Parse the results into something usable
oktagettojson <- toJSON(okta_content, simplifyDataFrame = TRUE, flatten = TRUE, recursive = TRUE)
oktagetdf <- fromJSON(oktagettojson, simplifyDataFrame = TRUE, flatten = TRUE)
dfnames <- names(oktagetdf)
oktagetdf <- oktagetdf %>% map_if(is.list, as.character)
oktagetdf <- do.call(cbind, lapply(oktagetdf, data.frame, stringsAsFactors=FALSE))
names(oktagetdf) <- dfnames
# adding columns to separate AD domain mastered account and domain names
oktagetdf <- separate(oktagetdf, profile.login,
into = c("credPrefix", "credSuffix"), sep = "@", remove = FALSE, extra = "drop")
# select some data frame columns of interest
okta_allusers <- subset(oktagetdf, select = c("id","status","created","lastLogin","profile.login","credPrefix", "credSuffix","profile.firstName","profile.lastName","profile.email","credentials.provider.type","credentials.provider.name"))