Search code examples
rdplyrpaste

Paste element of a vector into dplyr function


I have the following dataset:

df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
             a = c(7, 3, 5),
             b = c(5, 8, 1),
             c = c(8, 4, 3))

and this vector:

v <- c("a", "b", "c")

Now I want to create a new dataset and summarise a, b, and c by creating new variables (y_a, y_b, and y_c) that calculate the mean of each variable grouped by year.

The code for doing this is the following:

y <- df_x %>% group_by(year) %>%  dplyr::summarise(y_a = mean(a, na.rm = TRUE),
                y_b = mean(b, na.rm = TRUE),
                y_c = mean(c, na.rm = TRUE))

However, I want to use the vector v to read the respective variable from it and paste in into the summarise function:

y <- df_x %>% group_by(year) %>%  dplyr::summarise(as.name(paste0("y_", v[1])) = mean(as.name(v[1]), na.rm = TRUE),
                                                   as.name(paste0("y_", v[2])) = mean(as.name(v[1]), na.rm = TRUE),
                                                   as.name(paste0("y_", v[3])) = mean(as.name(v[1]), na.rm = TRUE))

Doing so, I receive the following error message:

Error: unexpected '=' in "y <- df_x %>% group_by(year) %>%  dplyr::summarise(as.name(paste0("y_", v[1])) ="

How can I paste the value of a vector in this summarise function so that it works?


Solution

  • To define a new variable on the left hand side, you need := instead of =. Because you create it with paste0, you need !! to inject the expression and make sure that is correctly evaluated. To access existing columns in dplyr with a string stored in a variable, using .data is the easiest way.

    library(dplyr)
    
    df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
                       a = c(7, 3, 5),
                       b = c(5, 8, 1),
                       c = c(8, 4, 3))
    
    v <- c("a", "b", "c")
    
    df_x %>% group_by(year) %>% 
      dplyr::summarise(!!paste0("y_", v[1]) := mean(.data[[v[1]]], na.rm = TRUE),
                       !!paste0("y_", v[2]) := mean(.data[[v[1]]], na.rm = TRUE),
                       !!paste0("y_", v[3]) := mean(.data[[v[1]]], na.rm = TRUE))
    #> # A tibble: 3 × 4
    #>    year   y_a   y_b   y_c
    #>   <dbl> <dbl> <dbl> <dbl>
    #> 1  2000     5     5     5
    #> 2  2001     5     5     5
    #> 3  2002     5     5     5
    

    Created on 2022-12-21 by the reprex package (v1.0.0)