Search code examples
rdplyrdata-cleaning

Rename multiple variables at once using dplyr


I am trying to rename multiple variables at once using Dplyr. Here is what the data looks like:

ID   Satisfaction_Baseline   Satisfaction_FollowUp   Satisfaction_Exit
1               1                      3                     4
2               5                      5                     5
3               5                      3                     4
4               5                      3                     2
5               5                      3                     5           

There is a lot of variables that have '_Baseline', '_FollowUp' and '_Exit'. I really would like a shorthand way to rename variables at once so the data looks more like this:

    ID   Satisfaction_1         Satisfaction_2        Satisfaction_3
1               1                      3                     4
2               5                      5                     5
3               5                      3                     4
4               5                      3                     2
5               5                      3                     5  

I tried this:

rename(D_wide,  Satisfaction_1 = Satisfaction_Baseline)

This works... but I wanted to know if there was a succinct way to write this piece of code so every variable that ends with '_Baseline', '_FollowUp' and '_Exit' end with '_1', '_2' and '_3'. Thank you!


Solution

  • 1) dplyr Since there are only a small number of keywords (Baseline, FollowUp and Exit) we can simply pipe the second argument of rename_with through three sub calls. If the keywords do not appear in other column names as is the situation here we could optionally omit the $ signs.

    library(dplyr)
    
    df %>%
      rename_with(. %>%
        sub("Baseline$", 1, .) %>%
        sub("FollowUp$", 2, .) %>%
        sub("Exit$", 3, .)
    )
    

    giving

      ID Satisfaction_1 Satisfaction_2 Satisfaction_3
    1  1              1              3              4
    2  2              5              5              5
    3  3              5              3              4
    4  4              5              3              2
    5  5              5              3              5
    

    2) purrr Alternately the same code works with set_names in place of rename_with . In this case dplyr is not used.

    library(purrr)
    
    df %>%
      set_names(. %>%
        sub("Baseline$", 1, .) %>%
        sub("FollowUp$", 2, .) %>%
        sub("Exit$", 3, .)
    )
    

    3) Base R To ensure that we only need to write df once at the beginning so as to maintain the left to right nature of the pipe create a list with a single component data. Then we can use data multiple times in the following line.

    df |>
      list(data = _) |>
      with(setNames(data, names(data) |>
        sub("Baseline$", 1, x = _) |> 
        sub("FollowUp$", 2, x = _) |>
        sub("Exit$", 3, x = _)
      ))
    

    4) Reduce This Base R variation uses Reduce with a list L whose names are the keywords and whose values are to replace them. Then use Reduce to repeatedly apply sub.

    L <- list(Baseline = 1, FollowUp = 2, Exit = 3)
    
    df |>
      list(data = _) |>
      with(setNames(data, 
        Reduce(\(x, nm) sub(nm, L[[nm]], x), names(L), init = names(data))))
    

    5) gsubfn Create a list L as in (4) whose names are the keywords and whose values are to replace them. Then use setNames with new names computed using gsubfn. Pass gsubfn a pattern that matches substrings not containing underscore and ending at the end of the string. It will look for matches to that pattern and any that equal a name in L will be replaced by the corresponding value in L.

    library(gsubfn)
    
    L <- list(Baseline = 1, FollowUp = 2, Exit = 3)
    df |>
      list(data = _) |>
      with(setNames(data, gsubfn("[^_]+$", L, names(data))))
    

    Note

    The input in reproducible form:

    df <- data.frame(
      ID = 1:5,
      Satisfaction_Baseline = rep(c(1L, 5L), c(1L, 4L)),
      Satisfaction_FollowUp = c(3L, 5L, 3L, 3L, 3L),
      Satisfaction_Exit = c(4L, 5L, 4L, 2L, 5L)
    )