I have asked a similar question on here before about how to count unique values from a dataframe, but I need to use "lapply" instead because the way I used previously doesn't work or I cant get it to work with a list. I have also been told the using one of the apply functions would be better.
This represents my data:
species1 <- data.frame(var_1 = c("a","a","a","b", "b", "b"), var_2 = c("c","c","d", "d", "e", "e"))
species2 <- data.frame(var_1 = c("f","f","f","g", "g", "g"), var_2 = c("h","h","i", "i", "j", "j"))
all_species <- list()
all_species[["species1"]] <- species1
all_species[["species2"]] <- species2
I want to use lapply to get the number of unique rows for each of my lists, for example, I need an output like:
count_all_species <- list()
count_all_species[["species1"]] <- data.frame(var_1 = c("a", "b"), unique_number = c("2", "2"))
Then the same for the second list using the "lapply" function
Here is an option with tidyverse
. We loop through the list
of data.frame
(with map
), grouped by 'var_1', summarise
to get the number of distinct elements in 'var_2' (n_distinct
)
library(dplyr)
library(purrr)
map(all_species, ~ .x %>%
group_by(var_1) %>%
summarise(unique_number = n_distinct(var_2)))
Or use the distinct
after looping through the list
and then do a count
map(all_species, ~ .x %>%
distinct() %>%
dplyr::count(var_1))
If the variable name changes, then we can use position in summarise_at
map(all_species, ~ .x %>%
group_by(var_1) %>%
summarise_at(1, n_distinct))
Or another option is to convert the column name string to a symbol (rlang::sym
) and then do the evaluation (!!
)
map(all_species, ~ .x %>%
group_by(var_1) %>%
summarise(unique_number = n_distinct(!! rlang::sym(names(.x)[2]))))