I am trying to remove rows in my dataframe that meet 2 conditions simultaneously. For example, based on the dataframe created below, I want to remove rows that are both green AND group A. However, based on the code I am using, rows are removed when they are green or group A.
data = data.frame(Group = c(rep('A',9), rep('B',9)),
Color= c(rep('Red',3), rep('Green',3), rep('Yellow', 3), rep('Red',4), rep('Green',5)))
summary(data)
names <- c(1:2)
data[,names] <- lapply(data[,names], factor)
summary(data)
newdata <- subset(data, Group != "A" & Color != "Green")
summary(newdata)
How can I get the result I am aiming for?
It sounds like you want this:
Group A Not Group A
Green EXCLUDE include
Not Green include include
Your line subset(data, Group != "A" & Color != "Green")
means you are only keeping rows that are BOTH Not Group A and Not Green, which is just the bottom right category. You want things that are EITHER Not Group A or Not Green, which could be done with |
= OR where you have &
= AND.
Or, as ~Darren-tsai noted, you could look for rows that are not BOTH A and Green, ie !(Group == "A" & Color == "Green)
.