I'm working with the NIH/NLM REST API and attempting to programmatically pull lots of data at once. I've never worked with an API that validates with Service Tickets (TGT and ST) instead of OAUTH, that need to be refreshed for every GET request you make, so I'm not sure if I"m even going about this the right way. Any help much appreciated.
Here's the code I currently have:
library(httr)
library(jsonlite)
library(xml2)
UTS_API_KEY <- 'MY API KEY'
# post to the CAS endpoint
response <- POST('https://utslogin.nlm.nih.gov/cas/v1/api-key', encode='form', body=list(apikey = 'MY API KEY'))
# print out the status_code and content_type
status_code(response)
headers(response)$`content-type`
doc <- content(response)
action_uri <- xml_text(xml_find_first(doc, '//form/@action'))
action_uri
# Service Ticket
response <- POST(action_uri, encode='form', body=list(service = 'http://umlsks.nlm.nih.gov'))
ticket <- content(response, 'text')
ticket #this is the ST I need for every GET request I make
# build search_uri using the paste function for string concatenation
version <- 'current'
search_uri <- paste('https://uts-ws.nlm.nih.gov/rest/search/', version, sep='')
# pass the the query params into httr GET to get the response
query_string <- 'diabetic foot'
response <- GET(search_uri, query=list(ticket=ticket, string=query_string))
## print out some of the results
search_uri
status_code(response)
headers(response)$`content-type`
search_results_auto_parsed <- content(response)
search_results_auto_parsed
class(search_results_auto_parsed$result$results)
search_results_data_frame <- fromJSON(content(response,'text'))
search_results_data_frame
This code works perfectly for just a handful of GET requests, however, I'm attempting to pull 300-something medical terms. For example, in query string, I'd like to loop through an array of strings (e.g., "diabetes", "blood pressure", "cardiovascular care", "EMT", etc.). I'd need to make the POST request and pass the ST into the GET parameter for every string in the array.
I've played around with this code:
for (i in 1:length(Entity_Subset$Entities)){
ent = Entity_Subset$Entities[i] #Entities represents my df of strings
url <- paste(' https://uts-ws.nlm.nih.gov/rest/search/current?string=',
ent,'&ticket=', sep = "")
print(url)
}
But haven't had much luck piecemealing together the POST and GET requests after putting the strings into the (GET) HTTPS request.
Sidebar: I also attempted writing some pre-scripts in Postman, but oddly the Service Ticket doesn't return as JSON (no key-value pair to grab and pass). Just plain text.
Thank you for any advice you can provide!
I think you can simply wrap both POST and GET requests in a function. Then, lapply
that function to a list of characters.
library(httr)
library(jsonlite)
library(xml2)
fetch_data <- function(query_string = 'diabetic foot', UTS_API_KEY = 'MY API KEY', version = 'current') {
response <- POST('https://utslogin.nlm.nih.gov/cas/v1/api-key', encode='form', body=list(apikey = UTS_API_KEY))
# print out the status_code and content_type
message(status_code(response), "\n", headers(response)$`content-type`)
action_uri <- xml_text(xml_find_first(content(response), '//form/@action')); message(action_uri)
# Service Ticket
response <- POST(action_uri, encode = 'form', body=list(service = 'http://umlsks.nlm.nih.gov'))
ticket <- content(response, 'text'); message(ticket)
# build search_uri using the paste function for string concatenation
search_uri <- paste0('https://uts-ws.nlm.nih.gov/rest/search/', version)
# pass the the query params into httr GET to get the response
response <- GET(search_uri, query=list(ticket=ticket, string=query_string))
## print out some of the results
message(search_uri, "\n", status_code(response), "\n", headers(response)$`content-type`)
fromJSON(content(response, 'text'))
}
# if you have a list of query strings, then
lapply(Entity_Subset$Entities, fetch_data, UTS_API_KEY = "blah blah blah")
# The `lapply` above is logically equivalent to
result <- vector("list", length(Entity_Subset$Entities))
for (x in Entity_Subset$Entities) {
result[[x]] <- fetch_data(x, "blah blah blah")
}