Search code examples
rdplyrpurrrtidyeval

R using dplyr::mutate() within purrr::map without duplicating rows


Here is data:

library(tidyverse)
col_pre <- c('a', 'b', 'c')
df <- tibble(a1 = 1:3, a2 = 4:6, b1 = 7:9, b2 = 10:12, c1 = 13:15, c2 = 16:18)

I want to use purrr::map() and dplyr::mutate() to create three new columns that are the sums of columns in df. I can use map() to iterate over a vector of the a, b, c column prefixes. I figured out the tidyeval operations so that the code below runs without error.

out <- col_pre %>%
  map_df(~ df %>% 
            mutate(!!as.name(paste0(.x, '3')) := !!as.name(paste0(.x, '1')) + !!as.name(paste0(.x, '2')))
  )

However, out now has six spurious rows:

     a1    a2    b1    b2    c1    c2    a3    b3    c3
1     1     4     7    10    13    16     5    NA    NA
2     2     5     8    11    14    17     7    NA    NA
3     3     6     9    12    15    18     9    NA    NA
4     1     4     7    10    13    16    NA    17    NA
5     2     5     8    11    14    17    NA    19    NA
6     3     6     9    12    15    18    NA    21    NA
7     1     4     7    10    13    16    NA    NA    29
8     2     5     8    11    14    17    NA    NA    31
9     3     6     9    12    15    18    NA    NA    33

What it's done is unnecessarily replicate the three rows of the input df.

Here is the output I want:

     a1    a2    b1    b2   c1    c2    a3     b3    c3
1     1     4     7    10    13    16     5    17    29
2     2     5     8    11    14    17     7    19    31
3     3     6     9    12    15    18     9    21    33

I have a feeling purrr::reduce() could be the solution, but I'm unsure how to apply it.

Any help is appreciated!


Solution

  • We can convert the strings to symbol before doing the evaluation, instead of mutate use transmute and later bind the columns with the original dataset

    library(stringr)
    library(purrr)
    library(dplyr)
    col_pre %>%
         map_dfc(~ df %>%
               transmute(!! str_c(.x, '3') :=  !! rlang::sym(str_c(.x, '1'))  + 
             !! rlang::sym(str_c(.x, 2)))) %>%
         bind_cols(df, .)
    # A tibble: 3 x 9
    #    a1    a2    b1    b2    c1    c2    a3    b3    c3
    #   <int> <int> <int> <int> <int> <int> <int> <int> <int>
    #1     1     4     7    10    13    16     5    17    29
    #2     2     5     8    11    14    17     7    19    31
    #3     3     6     9    12    15    18     9    21    33
    

    Or another option is parse_exprs

    df %>%
        mutate(!!! rlang::parse_exprs(str_c(sprintf("%s1 + %s2",
               col_pre, col_pre), collapse=";"))) %>% 
       rename_at(vars(contains("+")), ~ str_c(col_pre, 3))
    # A tibble: 3 x 9
    #     a1    a2    b1    b2    c1    c2    a3    b3    c3
    #  <int> <int> <int> <int> <int> <int> <int> <int> <int>
    #1     1     4     7    10    13    16     5    17    29
    #2     2     5     8    11    14    17     7    19    31
    #3     3     6     9    12    15    18     9    21    33
    

    Or another option is to convert it to 'long' format with pivot_longer and then do the calculation

    library(tidyr)
    df %>%
       mutate(rn = row_number()) %>%
       pivot_longer(cols = -rn, names_to = c(".value", "group"),
              names_sep ="(?<=[a-z])(?=[0-9])") %>%
       group_by(rn) %>%
       summarise_at(vars(col_pre), list(`3` = sum)) %>% 
       select(-rn) %>%
       bind_cols(df, .)
    

    Or if we use the devel version of dplyr (‘0.8.99.9000’), then across can be used along with summarise

    df %>%
         mutate(rn = row_number()) %>%
         pivot_longer(cols = -rn, names_to = c(".value", "group"),
               names_sep ="(?<=[a-z])(?=[0-9])") %>%
         group_by(rn) %>%
         summarise(across(col_pre, sum)) %>% 
         select(-rn) %>%
         rename_all(~ str_c(., 3)) %>% 
         bind_cols(df, .)
    # A tibble: 3 x 9
    #     a1    a2    b1    b2    c1    c2    a3    b3    c3
    #  <int> <int> <int> <int> <int> <int> <int> <int> <int>
    #1     1     4     7    10    13    16     5    17    29
    #2     2     5     8    11    14    17     7    19    31
    #3     3     6     9    12    15    18     9    21    33