Search code examples
rregextidyversetidyselect

Rename columns of R dataframe with tidyselect and regular expression


I have a dataframe whose columns names are combinations of numbering and some complicated texts:

  1. A1. Good day

  2. A1a. Have a nice day

......

  1. Z7d. Some other titles

Now I want to keep only the "A1.", "A1a.", "Z7d.", removing both the preceding number and the ending texts. Is there any idea how to do this with tidyselect and regex?


Solution

  • You can use this regex -

    names(df) <- sub('\\d+\\.\\s+([A-Za-z0-9]+).*', '\\1', names(df))
    names(df)
    #[1] "A1"  "A1a" "Z7d"
    

    The same regex can also be used in rename_with if you want a tidyverse answer.

    library(dplyr)
    df %>% rename_with(~sub('\\d+\\.\\s+([A-Za-z0-9]+).*', '\\1', .))
    
    #          A1        A1a        Z7d
    #1  0.5755992  0.4147519 -0.1474461
    #2  0.1347792 -0.6277678  0.3263348
    #3  1.6884930  1.3931306  0.8809109
    #4 -0.4269351 -1.2922231 -0.3362182
    #5 -2.0032113  0.2619571  0.4496466
    

    data

    df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435, 
    1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513, 
    -0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
    ), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201, 
    0.880910933597998, -0.336218174873965, 0.449646567320979)), 
    class = "data.frame", row.names = c(NA, -5L))