Search code examples
rtidyverserlang

How to get tidy dots to accept a variable range


I found this very helpful article on how to write a function accepting variable arguments using quosure and tidy dots. Here's some of the code:

my.summary <- function(df.name=df_tp1, group_var, ...) {
    group_var <- enquo(group_var)
    smry_vars <- enquos(..., .named = TRUE)

    the.mean <- purrr::map(smry_vars, function(var) {
        expr(mean(!!var, na.rm = TRUE))
    })
    names(the.mean) <- paste0("mean-", names(the.mean))

   df.name %>%
        group_by(!!group_var) %>%
        summarise(!!!the.mean)
}

The problem is I have to call the function with a long string of variables, like this:

cm_all1 <- my.summary(df_tp1_cm, group_var=net_role, so_part_value, cult_ci, cult_sn, cult_ebc, sl_t_lrn, sl_xt_lrn, nl_netops_km, so_rt, nl_netops_trust)

I would be very happy to be able to just call it with something like

so_part_value:nl_netops_trust

instead, but this gives errors like this:

Error in so_part_value:nl_netops_trust : NA/NaN argument

I also tried putting the variable names in a character vector and then using enquo() and !! but that didn't work.

I'd appreciate any ideas.

Here is my rewrite of the function using Yifu's ideas. This works for my fake data set but not the real data.

my.summary <- function(df.name=df_tp1, group_var, ...) {
##    group_var <- enquo(group_var)
    smry_vars <- df.name %>% select(...) %>% colnames()

    df.name %>%
        ##        group_by(!!group_var) %>%
        group_by({{group_var}}) %>%
        summarise_at(smry_vars,
                     list(mean=function(x) mean(x, na.rm=TRUE),
                          sd=function(x) sd(x, na.rm=TRUE),
                          min=function(x) min(x, na.rm=TRUE),
                          max=function(x) max(x, na.rm=TRUE),
                          q1=function(x) quantile(x, .25, na.rm=TRUE),
                          q2=function(x) quantile(x, .50, na.rm=TRUE),
                          q3=function(x) quantile(x, .75, na.rm=TRUE),
                          n=function(x) n()
                          ))
}

Solution

  • You just need to make sure ... is in the correct environment(the df you provided in this example). And then you can use colnames() to extract the column name.

    library(rlang)
    get_column_range <- function(df,...){
    
        writeLines("Column names as string:")
        print(df %>% select(...) %>% colnames())
        writeLines("Convert back to symbols")
        print(syms(df %>% select(...) %>% colnames()))
    }
    
    get_column_range(df = iris,Sepal.Length:Petal.Width)
    
    Column names as string:
    [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
    Convert to symbol
    [[1]]
    Sepal.Length
    
    [[2]]
    Sepal.Width
    
    [[3]]
    Petal.Length
    
    [[4]]
    Petal.Width
    

    And dplyr functions with _at suffix also accept string as variable, you don't have to convert them to quosure and then unquote them.

    Note that {{}} is a easier syntax to learn, it quotes and unquotes at the same time:

    my.summary <- function(df,group_var,...){
        column_names <- df %>% select(...) %>% colnames()
    
        df %>%
            group_by({{group_var}}) %>%
            summarise_at(column_names,list(mean = mean))
    }
    
    my.summary(df = iris,group_var = Species,Sepal.Length:Petal.Width)
    
    # A tibble: 3 x 5
      Species    Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
      <fct>                  <dbl>            <dbl>             <dbl>            <dbl>
    1 setosa                  5.01             3.43              1.46            0.246
    2 versicolor              5.94             2.77              4.26            1.33 
    3 virginica               6.59             2.97              5.55            2.03 
    

    For more, you can read at: https://rlang.r-lib.org/reference/quotation.html