I suspect that this will be a duplicate, but my efforts to find an answer have failed. Suppose that I have a data frame with columns made entirely of either integers or factors. Some of these columns have factors with many levels and some do not. Suppose that I want to select parts of or otherwise subset the data such that I only get the columns with factors that have less than 10 levels. How can I do this? My first thought was to make a particularly nasty sapply
command, but I'm hoping for a better way.
We can use select_if
library(dplyr)
df1 %>%
select_if(~ is.factor(.) && nlevels(.) < 10)
With a reproducible example using iris
data(iris)
iris %>%
select_if(~ is.factor(.) && nlevels(.) < 10)
Or using sapply
i1 <- sapply(df1, function(x) is.factor(x) && nlevels(x) < 10)
df1[i1]