Search code examples
rregexregex-lookaroundslookbehind

Remove characters prior to parentheses but after the preceding comma in R


I have the following dataframe:

df<-c("red apples,(golden,red delicious),bananas,(cavendish,lady finger),golden pears","yellow pineapples,red tomatoes,(roma,vine),orange carrots")

I want to remove the word preceding a comma and parentheses so my output would yield:

[1] "golden,red delicious),cavendish,lady finger),golden pears" "yellow pineapples,roma,vine),orange carrots"

Ideally, the right parenthesis would be removed as well. But I can manage that delete with gsub.

I feel like a lookbehind might work but can't seem to code it correctly.

Thanks!

edit: I amended the dataframe so that the word I want deleted is a string of two words.


Solution

  • We can use base R with gsub to remove the characters. We match a word (\\w+) followed by space (\\s+) followed by word (\\w+) comma (,) and (, replace with blank ("")

    gsub("\\w+\\s+\\w+,\\(", "", df)
    #[1] "golden,red delicious),cavendish,lady finger),golden pears" 
    #[2] "yellow pineapples,roma,vine),orange carrots"  
    

    Or if the , is one of the patterns to check for the words, we can create the pattern with characters that are not a ,

    gsub("[^,]+,\\(", "", df)
    #[1] "golden,red delicious),cavendish,lady finger),golden pears" 
    #[2] "yellow pineapples,roma,vine),orange carrots"