Search code examples
rdplyrnon-standard-evaluation

Using Dynamic Dots for Indirection in R / Tidyverse


I have a dataset with a number of date columns in excel serial date format. I've managed to convert the dates to POSIXct format using the following simple mutate


myDataSet_wrangled <- myDataSet %>%
    mutate(startDate = as.POSIXct(as.numeric(startDate) * 3600 * 24, origin = "1899-12-30", tz = "GMT"))

However, when I try to refactor this as a function of the form convertDate(df, ...), I can't seem to wrap my head around how to correctly indirect the column names. Frustratingly, the following code works with one column name, but when I pass multiple column names, it fails with an error "Error in 'mutate()': ... ! object 'endDate' not found"

myDataSet <- data.frame(
  startDate = c(44197.924, 44258.363, 44320.634), # dates in Excel format
  endDate = c(44201.131, 44270.859, 44330.023)
)

convertXlDateToPOSIXct <- function(df, ..., epoch = "1899-12-30", timezone = "GMT") {
  cols <- enquos(...)
  df <- df %>%
    mutate(across(!!!cols, ~ as.POSIXct(as.numeric(.x) * 3600 * 24, origin = epoch, tz = timezone)))
  return(df)
}

# Call with one column
myDataSet_wrangled <- myDataSet %>% 
    convertXlDateToPOSIXct(startDate)
# startDate correctly converted, no error thrown


# Call with multiple columns
myDataSet_wrangled <- myDataSet %>% 
    convertXlDateToPOSIXct(startDate, 
                           endDate)
# 404: endDate Not Found

I've tried various combinations of ..., enquos, ensyms, and !!!, but I think I'm fundamentally misunderstanding how name masking works in R.


Solution

  • The R Documentation (topic-data-mask-programming {rlang}) makes some reference to forwarding of ... arguments not requiring special syntax, and demonstrates that you can call e.g. group_by(...).

    I hadn't been able to work out why this syntax wasn't working in the code above, but (with thanks to @lotus) I've realised that real problem isn't that ... isn't properly enquo'd or ensym'd, but that across wants a single argument, rather than five or six or n arguments which are forwarded when passing ...; encapsulating ... with c() provides the column names in the expected format.

    convertXlDateToPOSIXct <- function(df, ..., epoch = "1899-12-30", timezone = "GMT") {
      df <- df %>%
        mutate(across(c(...), ~ as.POSIXct(as.numeric(.x) * 3600 * 24, origin = epoch, tz = timezone)))
    

    Alternatively, without the enclosing c(), calling with convertXlDateToPOSIXct(df, c(startDate, endDate)) would also work correctly, although it would make more sense to use a named parameter (e.g. convertXlDateToPOSIXct <- function(df, cols, epoch = "1899-12-30", timezone = "GMT")