Search code examples
rsubset

How to remove rows in dataframe that meet 2 conditions


I am trying to remove rows in my dataframe that meet 2 conditions simultaneously. For example, based on the dataframe created below, I want to remove rows that are both green AND group A. However, based on the code I am using, rows are removed when they are green or group A.

data = data.frame(Group = c(rep('A',9), rep('B',9)),
                Color= c(rep('Red',3), rep('Green',3), rep('Yellow', 3), rep('Red',4), rep('Green',5)))

summary(data)
names <- c(1:2)
data[,names] <- lapply(data[,names], factor)
summary(data)

newdata <- subset(data, Group != "A" & Color != "Green")
summary(newdata)

How can I get the result I am aiming for?


Solution

  • It sounds like you want this:

                 Group A       Not Group A
    Green        EXCLUDE       include
    Not Green    include       include
    

    Your line subset(data, Group != "A" & Color != "Green") means you are only keeping rows that are BOTH Not Group A and Not Green, which is just the bottom right category. You want things that are EITHER Not Group A or Not Green, which could be done with | = OR where you have & = AND.

    Or, as ~Darren-tsai noted, you could look for rows that are not BOTH A and Green, ie !(Group == "A" & Color == "Green).