Search code examples
rfunctionloopsgethttr

Iterating API GET over dates and combing data sets in R


I am attempting to create a function that can iterate over a specified time span (e.g., last 30 days or last 90 days). I'm limited to 2,500 records per pull, so I may need to perform a pull for 1 day at a time, or 1 week at a time depending on my parameters.

I have looked at API Query for loop for here, and can't quite get it to do what I want. I have created a while() function that produces a vector of URLs:

end_date   <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")

the_date <- start_date

while(the_date <= end_date)
{
  api <-  paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
               the_date,
               "^", 
               end_date,
               "&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
  the_date <- the_date + 1
  as.character(api)
  print(api)
  }

[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-27^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-28^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-29^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-30^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."

Here is where I get stuck. I would like to create a function that iterates over each URL, and then combines the data.

When I perform a single pull, I use the following:

api_get  <- GET(url)
api_raw  <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
api_df   <- xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr")

Creating 30 of these is certainly not the most efficient way... hoping to get some help on this.


Solution

  • This script should work, assuming your statements and parsing of the api/webpage is correct.
    See comments for details:

    end_date   <- Sys.Date()
    start_date <- as.Date("2020-01-27", format = "%Y-%m-%d") 
    the_date <- start_date
    
    #create an empty list
    output<-list()
    
    while(the_date <= end_date)
    {
      #Track which date is being pulled - handy for debugging when script errors
      print(the_date)
      url <-  paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
                     the_date,
                     "^", 
                     end_date,
                     "&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
    
      api_get  <- GET(url)
      api_raw  <- rawToChar(api_get$content)
      api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
    
      #Append dataframe to list - item named by date
      output[[as.character(the_date)]]<-xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr"))
      #slight system pause to prevent attacking the server
      Sys.sleep(0.7)
    
      the_date <- the_date + 1
    }
    
    #combine all of the dataframes in the output list into one large data frame
    alloutput<-do.call(rbind, output)