I am attempting to create a function that can iterate over a specified time span (e.g., last 30 days or last 90 days). I'm limited to 2,500 records per pull, so I may need to perform a pull for 1 day at a time, or 1 week at a time depending on my parameters.
I have looked at API Query for loop for here, and can't quite get it to do what I want. I have created a while()
function that produces a vector of URLs:
end_date <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")
the_date <- start_date
while(the_date <= end_date)
{
api <- paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
the_date,
"^",
end_date,
"&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
the_date <- the_date + 1
as.character(api)
print(api)
}
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-27^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-28^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-29^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-30^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
Here is where I get stuck. I would like to create a function that iterates over each URL, and then combines the data.
When I perform a single pull, I use the following:
api_get <- GET(url)
api_raw <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
api_df <- xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr")
Creating 30 of these is certainly not the most efficient way... hoping to get some help on this.
This script should work, assuming your statements and parsing of the api/webpage is correct.
See comments for details:
end_date <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")
the_date <- start_date
#create an empty list
output<-list()
while(the_date <= end_date)
{
#Track which date is being pulled - handy for debugging when script errors
print(the_date)
url <- paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
the_date,
"^",
end_date,
"&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
api_get <- GET(url)
api_raw <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
#Append dataframe to list - item named by date
output[[as.character(the_date)]]<-xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr"))
#slight system pause to prevent attacking the server
Sys.sleep(0.7)
the_date <- the_date + 1
}
#combine all of the dataframes in the output list into one large data frame
alloutput<-do.call(rbind, output)