Search code examples
rdataframer-factor

Subset data frame by factor cardinality?


I suspect that this will be a duplicate, but my efforts to find an answer have failed. Suppose that I have a data frame with columns made entirely of either integers or factors. Some of these columns have factors with many levels and some do not. Suppose that I want to select parts of or otherwise subset the data such that I only get the columns with factors that have less than 10 levels. How can I do this? My first thought was to make a particularly nasty sapply command, but I'm hoping for a better way.


Solution

  • We can use select_if

    library(dplyr) 
    df1 %>%
        select_if(~ is.factor(.) && nlevels(.) < 10)
    

    With a reproducible example using iris

    data(iris)
    iris %>%
           select_if(~ is.factor(.) && nlevels(.) < 10)
    

    Or using sapply

    i1 <- sapply(df1, function(x) is.factor(x) && nlevels(x) < 10)
    df1[i1]