I have a column of strings in a data frame where I would like to replace the values to include only the substring before the first " ("
, i.e., before the first space/open bracket pair. Not all of the strings contain brackets, and I want those to be left as they are.
Example data:
col1 <- c(1, 2, 3, 4)
col2 <- c("a b (ABC DE)", "bcd", "cd ef (CE)", "bcd")
df <- data.frame(col1, col2)
df
Output:
col1 col2
1 1 a b (ABC DE)
2 2 bcd
3 3 cd ef (CE)
4 4 bcd
The output I'm looking for would be something like this:
col1 <- c(1, 2, 3, 4)
col2 <- c("a b", "bcd", "cd ef", "bcd")
df <- data.frame(col1, col2)
df
Output:
col1 col2
1 1 a b
2 2 bcd
3 3 cd ef
4 4 bcd
The actual data frame is 40000+ rows with the strings taking many possible values, so it can't be done manually like in the example. I'm not confident at all working with regex/patterns, but accept this may be the most straightforward way to do this.
A possible solution, based on stringr
:
library(tidyverse)
df %>%
mutate(col2 = str_remove_all(col2, "\\s*\\(.*\\)\\s*"))
#> col1 col2
#> 1 1 a b
#> 2 2 bcd
#> 3 3 cd ef
#> 4 4 bcd