Search code examples
rparsingtidyverserename

Parsing column names with characters and numbers in R tidyverse


*** Edited I am using R tidyverse and trying to parse specific column names as follows: The column names in question contain characters and a number (representing a year), like "var 1975", "var 1976", etc. I would like to parse these names as to become: "var_75", "var_76", etc. In other words, I want to parse the number representing the year and remove the first two digits, then add the year as two digits to the characters, separated by an underscore.

I am trying this:

library(tidyverse)

df <- tibble("var 1975" = c(1:5), 
          "var 1976" = c(3,2,1,1,1),
          "age" = c(25,41,39,60,36) ,
          "satisfaction" = c(5,3,2,5,4)
          )

#  Output
# var 1975 var 1976   age        satisfaction
# 1         1         25            5
# 2         2         41            3
# 3         3         39            2
# 4         4         60            5
# 5         5         36            4



df <- df %>% 
   rename_with( .fn = function(.x){paste0(.x,  "_",
                                     parse_number(cols) -1900)},
           .cols=(contains("var")  )) %>%  #add year as suffix
   rename_with(.fn = ~gsub("([^\\d]+)\\d+_(\\d+)", "\\1_\\2", .x),
          .cols=contains("var"))

Solution

  • rename_with(df, ~ str_replace(., " \\d{2}", "_"), contains("var"))
    

    Explanation: " \\d{2}" is a regex string which looks for a space, followed by two digits. It's replaced by an underscore.

    Output:

    # A tibble: 5 × 4
      var_75 var_76   age satisfaction
       <int>  <dbl> <dbl>        <dbl>
    1      1      3    25            5
    2      2      2    41            3
    3      3      1    39            2
    4      4      1    60            5
    5      5      1    36            4