Search code examples
rsubsetcategorical-data

Need to subset by excluding multiple values in a categorical variable


I have a categorical field, and I want to subset by 'excluding' multiple values.

Initially, I had assumed I could just list out all the values I want directly into the code, or create a separate list and add it back into the code ( see below).

subset(data, data$variable != c("x1", "x2", "x3"))

or

Exclude_Prod = c("x1", "x2", "x3")

subset(data, data$variable != Exclude_Prod)

I have multiple values in a single field, which is a categorical variable.

I want to exclude these multiple values and then subset the data. The reason why I want to exclude is because there are less values compared to the ones I want to keep.


Solution

  • Try this: Replace with relevant variables. data3 is the dataset.

    library(dplyr)
    

    Using some fake data: With base R

    data3[!data3$Exclude_Prod%in%c("x1","x2"),]
    

    The "disadvantage" is that base R preserves the original indexing. With dplyr

    data3<-data.frame(Sales=c(11,12,13),Exclude_Prod = c("x1", "x2", "x3"))
    data3 %>% 
      filter(!Exclude_Prod%in%c("x1","x2"))
    

    Result:

     Sales Exclude_Prod
    1    13           x3
    

    Original Answer:

     mtcars %>% 
          mutate(ID=row.names(.)) %>% 
          select(ID) %>% 
          filter(!ID%in%c("Volvo 142E","Toyota Corona"))#eg Variable%in%c("x1", "x2", "x3)