I have a data frame like this.
df
Languages Order Machine Company
[1] W,X,Y,Z,H,I D D B
[2] W,X B A G
[3] W,I E B A
[4] H,I B C B
[5] W G G C
I want to get the number of rows where languages has 2 out of 3 values among W,H,I.
The result should be: 3 because row 1, row 3 and row 4 contains at least 2 values out of the3 values among W,H,I
You can use :
sum(sapply(strsplit(df$Languages, ','), function(x)
sum(c("W","H","I") %in% x) >= 2))
#[1] 3
data
df<- structure(list(Languages = c("W,X,Y,Z,H,I", "W,X", "W,I", "H,I",
"W"), Order = c("D", "B", "E", "B", "G"), Machine = c("D", "A",
"B", "C", "G"), Company = c("B", "G", "A", "B", "C")),
class = "data.frame", row.names = c(NA, -5L))