Search code examples
rdplyrtidyversestringr

Removing a prefix from a subset of column names using the str_remove function


In this SO post the accepted answer shows how to remove a prefix from a subset of column names. I will reproduce the toy data and solution and get to my issue. Note that I have altered the toy data by adding a suffix (_end) to two of the variables.

df <- data.frame(ATH_V1 = rnorm(10), ATH_V2_end = rnorm(10), ATH_V3_end = rnorm(10), ATH_V4 = rnorm(10), ATH_V5 = rnorm(10), ATH_V6 = rnorm(10), ATH_V7 = rnorm(10))

df
    
#        ATH_V1  ATH_V2_end ATH_V3_end     ATH_V4     ATH_V5      ATH_V6      ATH_V7
# 1  -1.5520380  1.16782520 -0.3628090  1.5238728 -1.1660806 -1.01416226 -0.95163564
# 2   0.6270134  1.63810443  0.2199733 -0.6175186 -1.8909463 -0.23913125 -0.70650296
# 3  -0.7462879  0.08504734  0.6506818 -0.5436457  1.3369322  1.69883194 -1.07623124
# 4   0.3196569  0.95782069 -0.3454795 -1.7485607  2.3896003  1.24958489 -0.73316675
# 5  -0.8820414 -2.01739089 -0.5881156  1.2725712  1.4251221  0.56213069 -0.47188011
# 6  -0.5534390  1.48974625 -0.2532402 -1.2333677  1.6690452 -0.48178503  0.30727117
# 7  -0.4637729 -1.13762829  1.3072153  1.0082090 -1.7958189 -1.37604307 -0.08900913
# 8  -0.3878013 -1.09693619 -0.9022672  0.1809460 -1.0303186  0.54576930 -0.64634653
# 9  -0.9553941  0.91495814 -0.2993733 -0.5860527 -0.5623538 -0.24521585  0.21297231
# 10  2.2891475  0.05568124 -0.1718192  0.4249103  2.6009601  0.06357305  0.47794076

I would like to remove the ATH_ prefix ONLY from the columns that end with _end.

Now the solution in the original post proposed the following code, where we specify the column names we want to operate on in a vector within rename_at and then remove the ATH_ prefix via the str_remove function, like so

df %>% rename_at(c("ATH_V2_end", "ATH_V3_end"), ~ .x %>% str_remove("^ATH_"))

#         ATH_V1     V2_end      V3_end      ATH_V4     ATH_V5     ATH_V6      ATH_V7
# 1   1.14822123 -0.6285561  0.52458507 -0.63906454  1.1401342 -1.6559726  0.41732258
# 2   0.07519307  2.0090135  0.13440368  1.24337727 -0.2906335 -0.1349698  1.45647898
# 3  -0.87465492 -1.8766134 -0.17119197 -1.22701678 -0.7603659  0.1015543 -1.06211069
# 4   1.01402581 -0.4744169  0.78326842 -0.02910686  0.1548202  1.0042147 -0.23739832
# 5   1.00613252 -1.5701097  1.64415870  0.86733910  0.1558727  0.3011537  0.05700506
# 6  -1.01416351 -1.7687648 -0.13999833 -1.01482747 -0.5732621 -0.2504362  2.20762232
# 7   1.00861721  0.7494679  0.08853307  1.46402775 -0.1153655  0.8427913 -1.16114455
# 8   0.28117809 -0.6669487 -0.50816389 -0.12875270  0.7798111 -0.3937148 -1.30894602
# 9  -0.23092640  2.8516271 -1.36959691 -0.39303227  1.9862182  1.2378769 -1.66039502
# 10  0.65034202  0.9009923  0.58264859  0.50931251  1.7284268  1.8420746 -0.71894637

However the help for the new dplyr suite of packages states that rename_at has been superseded by rename_with and that you can use some of the powerful functionality of the select functions to choose a subsets of columns.

So I would like to remove the ATH_ prefix ONLY from the columns that end with _end using the ends_with() function within rename_with() using tidyverse grammar.

I tried

df %>%
  select(ends_with("_end")) %>%
    rename_with(str_remove(string = ~.x,
                           pattern = "^ATH_"))

and

df %>%
  rename_with(cols = ends_with("_end"),
              .fn = str_remove(string = ~.x,
                               pattern = "^ATH_"))

And got the same error

Error in `rename_with()`:
! Can't convert `.fn`, a character vector, to a function.

Any help much appreciated


Solution

  • You put the ~ symbol to a wrong place... It should be

    df %>%
      rename_with(.cols = ends_with("_end"),
                  .fn = ~ str_remove(string = .x, pattern = "^ATH_"))
    
            ATH_V1       V2_end     V3_end      ATH_V4     ATH_V5     ATH_V6      ATH_V7
    1   1.50743299 -0.445307241  0.8299688  0.17539549 -0.1327284 -0.3396151  0.51307888
    2  -1.41938708  0.778638127 -0.2813838 -0.32856970  0.1652872 -0.3049578  0.94609307
    3   0.67968358 -1.424279034  0.4743970  0.07742006  0.1302074  0.2824700 -0.62150878
    4   1.37265457  0.626442526 -0.9043668 -1.26182381 -2.0965678  1.5024311 -0.13721899
    5   1.56945505 -0.808444575 -0.6629072 -1.05412193  2.2763880 -2.0970344 -1.67471537
    6  -1.33771537  1.610411569  0.3740234  1.08666291  0.4914622  0.2749874  3.37133643
    7  -0.02463483 -0.008389356  0.7068729 -0.03796850  0.3389535  0.9763993 -0.34287204
    8   0.31237309  0.011720063  0.1572582 -0.17382867  0.3284980  0.2716920 -0.07771273
    9  -1.20628787 -0.654695991 -0.3015155  0.32320577  2.1091207 -0.2484013 -1.46188370
    10 -0.56686265 -0.279659749  0.1913190 -1.58601761 -0.3031979 -1.2062704 -0.26730244
    

    More concise expression is

    df %>%
      rename_with(~ str_remove(.x, "^ATH_"), ends_with("_end"))
    

    and even

    df %>%
      rename_with(str_remove, ends_with("_end"), "^ATH_")