Search code examples
rstringfunctiondplyrstringr

Writing a function that reads both column y and text "y" in str_remove


I'm writing a function in R for some code cleaning that involves removing some text in a string. I've got the following dataframe:

x <- c(1, 2, 3, 4, 5)
y <-c("y_a","y_b","y_c","y_d","y_e")
z <-c("z_f","z_g","z_h","z_i","z_j")
df <- (x, y, z)

And I'm trying to clear the y_ and z_ out of columns y and z respectively.

Normally I'd write it as

df <- df %>%
mutate(y=str_remove(y,"y_"))

But I want to write a function so I can repeat. Right now my function looks like:

remove_weird_letter <- function(LETTER) {
mutate(LETTER=str_remove(LETTER,"LETTER_"))
}

df <- remove_weird_letter(y)
df <- remove_weird_letter(z)

But it's not reading seeing "LETTER" as a reference to the object in the function, so it's not removing what it needs to. How do I modify my syntax so that it knows to both reference a column called LETTER and text containing LETTER?

Thank you!


Solution

  • Here is a way.
    The function accepts a column and removes everything up to including the first underscore from the beginning of the strings.
    Three examples are given.

    suppressPackageStartupMessages(
      library(dplyr)
    )
    
    remove_weird_letter <- function(x) {
      pattern <- "^[^_]+_"
      stringr::str_remove(x, pattern)
    }
    
    x <- c(1, 2, 3, 4, 5)
    y <-c("y_a","y_b","y_c","y_d","y_e")
    z <-c("z_f","z_g","z_h","z_i","z_j")
    x827402 <- paste0("x827402_", 1:5)
    x19538 <- paste0("x19538_", 1:5)
    x300004192 <- paste0("x300004192_", 1:5)
    df <- data.frame(x, y, z, x827402, x19538, x300004192)
    
    # remove column name from one column
    df %>%
      mutate(y = remove_weird_letter(y))
    #>   x y   z   x827402   x19538   x300004192
    #> 1 1 a z_f x827402_1 x19538_1 x300004192_1
    #> 2 2 b z_g x827402_2 x19538_2 x300004192_2
    #> 3 3 c z_h x827402_3 x19538_3 x300004192_3
    #> 4 4 d z_i x827402_4 x19538_4 x300004192_4
    #> 5 5 e z_j x827402_5 x19538_5 x300004192_5
    
    # remove column names from several columns
    df %>%
      mutate(across(y:x300004192, ~ remove_weird_letter(.x)))
    #>   x y z x827402 x19538 x300004192
    #> 1 1 a f       1      1          1
    #> 2 2 b g       2      2          2
    #> 3 3 c h       3      3          3
    #> 4 4 d i       4      4          4
    #> 5 5 e j       5      5          5
    
    # remove column names from several columns
    df %>%
      mutate(across(y:x300004192, ~ remove_weird_letter(.x)))
    #>   x y z x827402 x19538 x300004192
    #> 1 1 a f       1      1          1
    #> 2 2 b g       2      2          2
    #> 3 3 c h       3      3          3
    #> 4 4 d i       4      4          4
    #> 5 5 e j       5      5          5
    
    # remove column names from columns starting with "x"
    df %>%
      mutate(across(starts_with("x"), ~ remove_weird_letter(.x)))
    #>   x   y   z x827402 x19538 x300004192
    #> 1 1 y_a z_f       1      1          1
    #> 2 2 y_b z_g       2      2          2
    #> 3 3 y_c z_h       3      3          3
    #> 4 4 y_d z_i       4      4          4
    #> 5 5 y_e z_j       5      5          5
    

    Created on 2023-10-31 with reprex v2.0.2