I've been trying for a long time to find a way to use a relatively easy command to cut off characters from the beginning and end of text that are not alphabetic. However, it is important that there can be e.g. numeric characters within the text.
Let me give you an example:
a <- c("1) dog with 4 legs", "- cat with 1 tail", "2./ bird with 2 wings." )
b <- c("07 mouse with 1 tail.", "2.pig with 1 nose,,", "$ cow with 4 spots_")
data <- data.frame(cbind(a, b))
The proper outcome would be this:
a <- c("dog with 4 legs", "cat with 1 tail", "bird with 2 wings" )
b <- c("mouse with 1 tail", "pig with 1 nose", "cow with 4 spots")
data_cleaned <- data.frame(cbind(a, b))
Is there a simple solution?
You could use trimws()
:
data[1:2] <- lapply(data[1:2], trimws, whitespace = "[^A-Za-z]+")
data
# a b
# 1 dog with 4 legs mouse with 1 tail
# 2 cat with 1 tail pig with 1 nose
# 3 bird with 2 wings cow with 4 spots
Its dplyr
equivalent is
library(dplyr)
data %>%
mutate(across(a:b, trimws, whitespace = "[^A-Za-z]+"))