Search code examples
rfunctiondplyrtidyeval

Passing a string to an R function and using it as a column name within the function


I have a dataframe with a list of scores of students for a few subjects (each subject represented by a column) I want to do the calculation below for each subject (Math, Science and Reading)

avgdata_math <- data%>% 
   group_by(country) %>% 
   summarise(ci = list(bootstrap_ci(sex, Math, weight))) %>% 
   unnest_wider(ci) %>% 
   ungroup() %>% 
   mutate(country = fct_reorder(country, avg))

Since I have to repeat the same code twice I want to write a function to do the calculation (without pivoting the dataframe)

aus_nz <- function(df, subject = "Math") {
   df %>%
    group_by(country) %>% 
    summarise(ci = list(bootstrap_ci(sex, subject, weight))) %>% 
    unnest_wider(ci) %>% 
    ungroup() %>% 
    mutate(country = fct_reorder(country, avg))
}

This gives me an error, since I've passed the column name(subject) as a string, then grouped data and thereafter used a string value in calling the bootstrap_ci function, whereas it should be a column of data passed there (which should be after the group operation).


Solution

  • Using !! rlang::ensym(subject) in your function should work.

    aus_nz <- function(df, subject = "Math") {
       df %>%
        group_by(country) %>% 
        summarise(ci = list(bootstrap_ci(sex, !! rlang::ensym(subject), weight))) %>% 
        unnest_wider(ci) %>% 
        ungroup() %>% 
        mutate(country = fct_reorder(country, avg))
    }
    

    Update

    If you also want to pass the grouping variable as a string into the function and if you sometimes have more than one variable you want to group by, then using !!!, rlang::ensyms() and the ellipsis ... argument would do the trick, if it not were for your last line of your function. fct_reorder only expects one variable. In case of two grouping variables: what would you do? Create two new variables and reorder each grouping variable by avg? It would also be helpful to see your data (maybe with dput(head(...))).

    aus_nz <- function(df, subject = "Math", ...) {
    
    group_var <- rlang::ensyms(...)
    
      df %>%
        group_by(!!! group_var) %>%
        summarise(ci = list(bootstrap_ci(sex, !! rlang::ensym(subject), weight))) %>%
        unnest_wider(ci) %>%
        ungroup() # %>% last line needs to be fixed
        # mutate(grouped_by = fct_reorder(!!! group_var, avg))
    } 
    

    If you do not want to use the ellipsis argument, you can use rlang::syms and a character vector (with one or multiple elements) instead:

    aus_nz <- function(df, subject = "Math", group = "country") {
    
    group_var <- rlang::syms(group)
    
      df %>%
        group_by(!!! group_var) %>%
        summarise(ci = list(bootstrap_ci(sex, !! rlang::ensym(subject), weight))) %>%
        unnest_wider(ci) %>%
        ungroup() # %>% last line needs to be fixed
        # mutate(grouped_by = fct_reorder(!!! group_var, avg))
    }