Search code examples
rtrim

Trimming everything down to alphabetic characters (R)


I've been trying for a long time to find a way to use a relatively easy command to cut off characters from the beginning and end of text that are not alphabetic. However, it is important that there can be e.g. numeric characters within the text.

Let me give you an example:

a <- c("1) dog with 4 legs", "- cat with 1 tail", "2./ bird with 2 wings." )
b <- c("07 mouse with 1 tail.", "2.pig with 1 nose,,", "$ cow with 4 spots_")
data <- data.frame(cbind(a, b))

The proper outcome would be this:

a <- c("dog with 4 legs", "cat with 1 tail", "bird with 2 wings" )
b <- c("mouse with 1 tail", "pig with 1 nose", "cow with 4 spots")
data_cleaned <- data.frame(cbind(a, b))

Is there a simple solution?


Solution

  • You could use trimws():

    data[1:2] <- lapply(data[1:2], trimws, whitespace = "[^A-Za-z]+")
    data
    
    #                   a                 b
    # 1   dog with 4 legs mouse with 1 tail
    # 2   cat with 1 tail   pig with 1 nose
    # 3 bird with 2 wings  cow with 4 spots
    

    Its dplyr equivalent is

    library(dplyr)
    
    data %>%
      mutate(across(a:b, trimws, whitespace = "[^A-Za-z]+"))