Search code examples
rrlangtidyevalhttr2

Build a dynamic-dots list to read dataframe one row at a time


I am trying to build a GET request one row at a time from a dataframe where the possible parameters are a varied and large list, and the dataframe I'm passing to the function may not have the correctly named column headers.

SAMPLE STARTING DATA

library(tidyverse)
library(httr2)

dfInput <- structure(list(orig_zip = c("17502", "66616", "M1P2T7"),
                     orig_ctry = c("USA", "MEX", "CAN")),
                row.names = c(NA, 3L), class = "data.frame")
> dfInput
  orig_zip orig_ctry
1    17502       USA
2    66616       MEX
3   M1P2T7       CAN

Demonstration Function 1

This first function shows the desired output taken from each row of the dataframe, but here I have it hard-coded and not actually using ... as I would like. No flexibility in what parameters are chosen as this is written, but postalcode = df$orig_zip[i], country = df$orig_ctry[i] is the sort of code I need inside req_url_query():

f1 <- function(df, ...){

  req <- httr2::request("http://some/base/url")

  for (i in 1:nrow(df)){
    req %>%
      req_url_query(postalcode = df$orig_zip[i], country = df$orig_ctry[i]) %>%
      req_dry_run()
  }
}

> f1(dfInput, postalcode = orig_zip, country = orig_ctry)

GET /base/url?postalcode=17502&country=USA HTTP/1.1
***OUTPUT TRUNCATED***
GET /base/url?postalcode=66616&country=MEX HTTP/1.1
***OUTPUT TRUNCATED***
GET /base/url?postalcode=M1P2T7&country=CAN HTTP/1.1
***OUTPUT TRUNCATED***

Demonstration Function 2

This function makes use of ..., but I'm passing the actual values in the function call, and obviously this doesn't proceed through the dataframe.

f2 <- function(df, ...){
  
  req <- httr2::request("http://some/base/url")
  
  for (i in 1:nrow(df)){
    req %>%
      req_url_query(...) %>%
      req_dry_run()
  }
}

> f2(dfInput, postalcode = "17502", country = "USA")

GET /base/url?postalcode=17502&country=USA HTTP/1.1
***OUTPUT TRUNCATED***
GET /base/url?postalcode=17502&country=USA HTTP/1.1
***OUTPUT TRUNCATED***
GET /base/url?postalcode=17502&country=USA HTTP/1.1
***OUTPUT TRUNCATED***

Other Efforts

I made several efforts (not all shown) to play around with various rlang quoting functions until I got to what you see below, but I feel that this is getting probably too far afield from some more straightforward procedure.

f3 <- function(df, ...){
  
  args <- list2(enexprs(...))
  print(args)
  a1 <- lapply(args, \(x) paste0(".data$", x, "[i]"))
  print(a1)
  # req <- httr2::request("http://some/base/url")
  
  # for (i in 1:nrow(df)){
  #   req %>%
  #     req_url_query(...) %>%
  #     req_dry_run()
  # }
}

> f3(dfInput, postalcode = orig_zip, country = orig_ctry)
[[1]]
[[1]]$postalcode
orig_zip

[[1]]$country
orig_ctry


[[1]]
[1] ".data$orig_zip[i]"  ".data$orig_ctry[i]"

I'm still working on this, but after burning a couple hours, I'm not getting it and could use some help. I appreciate any that you could offer. Thank you.

Objectives (in priority order):

  1. Write a package-quality function(s) that will accept a dataframe and an unspecified number of arguments that can generate GET requests one row at a time. I believe ... is necessary to do this, but am a padawanR, not a JediR, so I'm open-minded on that point.
  2. More broadly, I am trying to understand rlang, tidyeval, data-masking, quosures, quasiquotation, NSE, etc. I feel some of that is relevant here, but again... padawanR (with an 883 reputation after 9 years ;-) )
  3. More specifically to this problem as I laid out in my demonstration functions above, I have an odd situation that feels like it should be a somewhat common problem: I need to get list(postalcode = "17502", country = "USA") one row at time from a dataframe with column names orig_zip and orig_ctry, but next time I use the function, I might need to pass list(address = "1 Infinite Loop", city = "Redmond", state = "TN") from a dataframe with columns orig_addr, orig_city, and orig_state. So, I have flexible function parameter (postalcode) that has to point to dataframe's column name (orig_zip) that has to point to one value in that column at a time ("17502"). How do you do that?

Solution

  • The trick is to use the dots to build an intermediate data frame with transmute() that only contains the columns of interest. And then inject rows with !!!.

    Edit: Instead of transmute() you might prefer to use select() instead. With the former, you can create new columns with expressions. With the latter, you can use the full syntax of tidyselection to select existing columns.

    f <- function(df, ...) {
      req <- httr2::request("http://some/base/url")
    
      # Transform to tibble to make sure `[` behaves predictably.
      # Also prevents grouped df to disturb behaviour.
      # Use `transmute()` to reduce the data frame to passed inputs.
      df <- df |> as_tibble() |> transmute(...)
    
      for (i in seq_len(nrow(df))) {
        # Inject row into the query with `!!!`
        req %>%
          req_url_query(!!!df[i, ]) %>%
          req_dry_run()
      }
    }
    
    f(dfInput, postalcode = orig_zip, country = orig_ctry)
    #> GET /base/url?postalcode=17502&country=USA HTTP/1.1
    #> Host: some
    #> User-Agent: httr2/0.2.3 r-curl/5.0.1 libcurl/7.85.0
    #> Accept: */*
    #> Accept-Encoding: deflate, gzip
    #>
    #> GET /base/url?postalcode=66616&country=MEX HTTP/1.1
    #> Host: some
    #> User-Agent: httr2/0.2.3 r-curl/5.0.1 libcurl/7.85.0
    #> Accept: */*
    #> Accept-Encoding: deflate, gzip
    #>
    #> GET /base/url?postalcode=M1P2T7&country=CAN HTTP/1.1
    #> Host: some
    #> User-Agent: httr2/0.2.3 r-curl/5.0.1 libcurl/7.85.0
    #> Accept: */*
    #> Accept-Encoding: deflate, gzip