Search code examples
rdplyrrenamerlangtidyselect

Programming with dplyr: Renaming a column with variable using glue syntax


I've read through Programming with dplyr and understand that rename() and select() use tidy selection. I'm trying to combine this with the glue syntax to create a custom function using the new double curly syntax (rlang v0.4.0), however I'm getting extra quotation marks:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

sel_var = "homeworld"

# Attempt at using (newer) double curly syntax:
starwars %>% 
  select("{{sel_var}}_old" := {{ sel_var }})
#> # A tibble: 87 x 1
#>    `"homeworld"_old`
#>    <chr>            
#>  1 Tatooine               
#> # ... with 77 more rows

# Working, but uglier (and older) bang bang syntax:
starwars %>% 
  select(!!sym(paste0(sel_var, "_old")) := {{ sel_var }})
#> # A tibble: 87 x 1
#>    homeworld_old
#>    <chr>        
#>  1 Tatooine          
#> # ... with 77 more rows

Created on 2021-02-16 by the reprex package (v0.3.0)

How can I avoid the extra quotations marks in `"homeworld"_old` using the double curly {{ }} and glue := syntax? This is shown to work for summarise("mean_{{expr}}" := mean({{ expr }}), ...) in a function here.


Solution

  • The {{ operator inside the glue mechanism works at the level of expressions, not strings. When an expression contains a string, the quotes (") are also a part of that same expression, which is why you see them in the output. If you convert your string to a variable name, everything should work as expected:

    sel_var <- as.name("homeworld")
    
    starwars %>% 
      select("{{sel_var}}_old" := {{ sel_var }})
    # # A tibble: 87 x 1
    #    homeworld_old
    #    <chr>        
    #  1 Tatooine     
    #  2 Tatooine     
    # ...
    

    NOTE that the summarise("mean_{{expr}}" := mean({{ expr }}), ...) example you linked has the same property. For example, here's one of the functions defined in that vignette:

    my_summarise5 <- function(data, mean_var, sd_var) {
      data %>% 
        summarise(
          "mean_{{mean_var}}" := mean({{ mean_var }}), 
          "sd_{{sd_var}}" := mean({{ sd_var }})
        )
    }
    

    Everything works as expected when you pass variable names to the function:

    my_summarise5( mtcars, mpg, mpg )
    #   mean_mpg   sd_mpg
    # 1 20.09062 20.09062
    

    However, passing strings will include " in the output, as in your case:

    my_summarise5( mtcars, "mpg", "mpg" )
    #   mean_"mpg" sd_"mpg"
    # 1         NA       NA
    # Warning messages:
    # 1: In mean.default(~"mpg") :
    #   argument is not numeric or logical: returning NA
    # 2: In mean.default(~"mpg") :
    #   argument is not numeric or logical: returning NA