Search code examples
rdataframesubsetcategorical-datalevels

Subset dataset with several levels of a categorical variable


I want to subset a dataset with several levels of a categorical variable in Rstudio.

With the function "subset" I am able to do it with just one level

new_df<-subset(df, df$cat.var=="level.1")

How do I subset with more than one levels?


Solution

  • You can use %in%.

    This is a membership operator that you can use with a vector of the factor levels of cat.var which you would like to retain rows for.

    new_df <- subset(df, df$cat.var %in% c("level.1", "level.2"))
    

    For example

    df <- data.frame(fct = rep(letters[1:3], times = 2), nums = 1:6)
    
    df
    
    # This is our example data.frame
    #   fct nums
    # 1   a    1
    # 2   b    2
    # 3   c    3
    # 4   a    4
    # 5   b    5
    # 6   c    6
    
    subset(df, df$fct %in% c("a", "b"))
    
    # Subsetting on a factor using %in% returns the following output:
    #   fct nums
    # 1   a    1
    # 2   b    2
    # 4   a    4
    # 5   b    5
    

    Note: Another option is to use the filter function from dplyr as follows

    library(dplyr)
    
    filter(df, fct %in% c("a", "b"))
    

    This returns the same filtered (subsetted) dataframe.