I'm writing a function in R for some code cleaning that involves removing some text in a string. I've got the following dataframe:
x <- c(1, 2, 3, 4, 5)
y <-c("y_a","y_b","y_c","y_d","y_e")
z <-c("z_f","z_g","z_h","z_i","z_j")
df <- (x, y, z)
And I'm trying to clear the y_ and z_ out of columns y and z respectively.
Normally I'd write it as
df <- df %>%
mutate(y=str_remove(y,"y_"))
But I want to write a function so I can repeat. Right now my function looks like:
remove_weird_letter <- function(LETTER) {
mutate(LETTER=str_remove(LETTER,"LETTER_"))
}
df <- remove_weird_letter(y)
df <- remove_weird_letter(z)
But it's not reading seeing "LETTER" as a reference to the object in the function, so it's not removing what it needs to. How do I modify my syntax so that it knows to both reference a column called LETTER and text containing LETTER?
Thank you!
Here is a way.
The function accepts a column and removes everything up to including the first underscore from the beginning of the strings.
Three examples are given.
suppressPackageStartupMessages(
library(dplyr)
)
remove_weird_letter <- function(x) {
pattern <- "^[^_]+_"
stringr::str_remove(x, pattern)
}
x <- c(1, 2, 3, 4, 5)
y <-c("y_a","y_b","y_c","y_d","y_e")
z <-c("z_f","z_g","z_h","z_i","z_j")
x827402 <- paste0("x827402_", 1:5)
x19538 <- paste0("x19538_", 1:5)
x300004192 <- paste0("x300004192_", 1:5)
df <- data.frame(x, y, z, x827402, x19538, x300004192)
# remove column name from one column
df %>%
mutate(y = remove_weird_letter(y))
#> x y z x827402 x19538 x300004192
#> 1 1 a z_f x827402_1 x19538_1 x300004192_1
#> 2 2 b z_g x827402_2 x19538_2 x300004192_2
#> 3 3 c z_h x827402_3 x19538_3 x300004192_3
#> 4 4 d z_i x827402_4 x19538_4 x300004192_4
#> 5 5 e z_j x827402_5 x19538_5 x300004192_5
# remove column names from several columns
df %>%
mutate(across(y:x300004192, ~ remove_weird_letter(.x)))
#> x y z x827402 x19538 x300004192
#> 1 1 a f 1 1 1
#> 2 2 b g 2 2 2
#> 3 3 c h 3 3 3
#> 4 4 d i 4 4 4
#> 5 5 e j 5 5 5
# remove column names from several columns
df %>%
mutate(across(y:x300004192, ~ remove_weird_letter(.x)))
#> x y z x827402 x19538 x300004192
#> 1 1 a f 1 1 1
#> 2 2 b g 2 2 2
#> 3 3 c h 3 3 3
#> 4 4 d i 4 4 4
#> 5 5 e j 5 5 5
# remove column names from columns starting with "x"
df %>%
mutate(across(starts_with("x"), ~ remove_weird_letter(.x)))
#> x y z x827402 x19538 x300004192
#> 1 1 y_a z_f 1 1 1
#> 2 2 y_b z_g 2 2 2
#> 3 3 y_c z_h 3 3 3
#> 4 4 y_d z_i 4 4 4
#> 5 5 y_e z_j 5 5 5
Created on 2023-10-31 with reprex v2.0.2