I have a dataframe that I split into a list of dataframes based on a categorical variable in the dataframe:
list <- split(mpg, mpg$manufacturer)
I want to filter the list to only include dataframes where one of the categorical columns in each dataframe contain at least 5 unique factors, and remove those with less than 5.
I have tried lapply and filter over the dataset, but the result is filtering each dataframe, not the list entirely, as well as:
filteredlist <- lapply(list, function(x) length(unique(x$class) >= 5))
and am stumped.
Thanks, Any help would be appreciated!
First let's take a look at how many unique classes there are:
sapply(list, \(x) length(unique(x$class)))
# audi chevrolet dodge ford honda hyundai jeep land rover lincoln
# 2 3 3 3 1 2 1 1 1
# mercury nissan pontiac subaru toyota volkswagen
# 1 3 1 3 4 3
So, with this data, the >= 5
isn't a great example because it will have 0 results. Let's do >= 3
so we can expect a non-empty result.
## with Filter
filteredlist <- Filter(list, f = function(x) length(unique(x$class)) >= 3)
length(filteredlist)
# [1] 7
## or with sapply and `[`
sapply_filter = list[sapply(list, \(x) length(unique(x$class))) >= 3]
length(sapply_filter)
# [1] 7
Note that in your attempt lapply(list, function(x) length(unique(x$class) >= 5))
you have a parentheses typo, you want length(unique()) >= 5)
not length(unique(...) >= 5))