Search code examples
rdplyrnse

NSE vs SE in mutate_


I've been reading and reading, but I cannot understand this NSE vs SE in R. I hope somebody can explain it properly.

df=data.frame(a=1:6,b=7:12,c=13:18,d=rep(c("a","b"),each=3))

This is what I'm used to, and it works:

df %>% group_by(d) %>% mutate(new=sum(a))

Now I'm into new territory, this works, but throws a warning. Can somebody explain to me how I'm to do this if not with group_by_?

var="d"
df %>% group_by_(`var`) %>% mutate(new=sum(a))

Warning message: group_by_() is deprecated. Please use group_by() instead

Now, onto what I'm really trying to do, this just throws an error:

var="d"
var2="a"
df %>% group_by_(`var`) %>% mutate_(new=sum(`var2`))

Error in sum(var2) : invalid 'type' (character) of argument

I'm really trying to understand the fundamentals here... thanks!


Solution

  • The book on Tidy Evaluation is a fantastic resource to learn about NSE. You may find Chapter 8 particularly useful.

    In your case, you need to first convert your character strings into symbolic variable names:

    s1 <- sym(var)
    s2 <- sym(var2)
    

    If you were to use s1 and s2 in dplyr directly, it would attempt to look for columns named s1 and s2 in your data frame. This is not what you want. Instead, you want to access the symbols stored inside the variables s1 and s2. You can do this through the unquoting operator !!:

    df %>% group_by( !!s1 ) %>% mutate( new=sum(!!s2) )
    
    ## Or putting everything together
    df %>% group_by( !!sym(var) ) %>% mutate( new=sum(!!sym(var2)) )