Search code examples
rdplyrgsubstringrtibble

how to remove ONLY a specific group of characters from both names and values of dataframe in R


assuming this is my df

df <- tibble(`a*`=c("_x__", "*y", "z+-"),
             b=c("_x__", "*y", "z+-"))
> df
# A tibble: 3 x 2
  `a*`  b    
  <chr> <chr>
1 _x__  _x__ 
2 *y    *y   
3 z+-   z+-  

I want to remove *, _ and + characters from both column names and values if exist to get

# A tibble: 3 x 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-  

so I am using gsub(), but it only removes the first character. in fact I am looking for a pretty way to achieve both these changes using dply r pipes. Any hint or idea is appreciated.

df %>%
  mutate_all(funs(gsub(c("_","[*]","+"),"",.))) 


names(df) <- str_remove_all("[*]")

Solution

  • We can pass multiple characters to match within [] in str_remove or gsub. But, not a vector of patterns in gsub as pattern is not vectorized in gsub

    library(dplyr)
    library(stringr)
    df <- df %>% 
       transmute(across(everything(), str_remove_all,
        pattern = "[*_+]", .names = "{str_remove_all(.col, '[*_+]')}"))
    

    -output

    df
    # A tibble: 3 × 2
      a     b    
      <chr> <chr>
    1 x     x    
    2 y     y    
    3 z-    z-