Search code examples
rdplyrnse

Dplyr Non Standard Evaluation -- Help Needed


I am making my first baby steps with non standard evaluation (NSE) in dplyr. Consider the following snippet: it takes a tibble, sorts it according to the values inside a column and replaces the n-k lower values with "Other".

See for instance:

library(dplyr)

df <- cars%>%as_tibble

k <- 3

df2 <- df %>%
arrange(desc(dist))  %>% 
mutate(dist2 = factor(c(dist[1:k],
                rep("Other", n() - k)),
                levels = c(dist[1:k], "Other")))

What I would like is a function such that:

df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)

produces the same result, where old_column column "dist" (the column I use to sort the data set), new_column (the column I generate) is "dist2" and levels_to_keep is "k" (number of values I explicitly retain). I am getting lost in enquo, quo_name etc...

Any suggestion is appreciated.


Solution

  • You can do:

    library(dplyr)
    
    sort_keep=function(df,old_column, new_column, levels_to_keep){
      old_column = enquo(old_column)
      new_column = as.character(substitute(new_column))
      df %>%
        arrange(desc(!!old_column))  %>% 
        mutate(use = !!old_column,
               !!new_column := factor(c(use[1:levels_to_keep],
                                      rep("Other", n() - levels_to_keep)),
                                    levels = c(use[1:levels_to_keep], "Other")),
               use=NULL)
    }
    
    
     df%>%sort_keep(dist,dist2,3)