Search code examples
rdataframefilterconditional-statementsany

Filter rows which has at least two of particular values


I have a data frame like this.

df
     Languages          Order   Machine    Company
[1]    W,X,Y,Z,H,I       D         D          B
[2]    W,X               B         A          G
[3]    W,I               E         B          A
[4]    H,I               B         C          B
[5]    W                 G         G          C

I want to get the number of rows where languages has 2 out of 3 values among W,H,I.

The result should be: 3 because row 1, row 3 and row 4 contains at least 2 values out of the3 values among W,H,I


Solution

  • You can use :

    sum(sapply(strsplit(df$Languages, ','), function(x) 
               sum(c("W","H","I") %in% x) >= 2))
    #[1] 3
    

    data

    df<- structure(list(Languages = c("W,X,Y,Z,H,I", "W,X", "W,I", "H,I", 
    "W"), Order = c("D", "B", "E", "B", "G"), Machine = c("D", "A", 
    "B", "C", "G"), Company = c("B", "G", "A", "B", "C")), 
    class = "data.frame", row.names = c(NA, -5L))