Search code examples
rdataframedplyrfilteringstringr

Filter row with one specific string value in R


I have a dataframe in R as below:

Fruits
Apple:1
Apple:4
Bananna    
Papaya    
Orange, Apple:2

I want to filter rows with string Apple as

Apple:1
Apple:4 

I tried using dplyr package.

df <- dplyr::filter(df, grepl('Apple', Fruits))

But it filters rows with string Apple as:

Apple:1
Apple: 4     
Orange, Apple:2

How to remove rows with multiple strings and filter rows with one specific string (in this case Apple)?


Solution

  • EDIT:

    Assuming, based on comments made by OP, that strings should be filtered where the only fruit mentioned is Apple and assuming further that the list of non-Apple fruit is manageable, you could do this:

    df %>% 
      filter(str_detect(Fruits, '^(?!.*Banana|Orange).*Apple'))
                       Fruits
    1 Apple, Apple:2, Apple:7
    

    Here, we use negative look-ahead (?!.*Banana|Orange) to assert that Banana or Orange must not be present in the string together with Apple

    Data:

    df <- data.frame(
      Fruits = c("Orange, Apple:2", 
                 "Apple, Apple:2, Apple:7", 
                 "Apple:2, Banana:10"))