I need to drop the factor variables with one level (excluding NAs) in the nested dataset. The function below 'drop_fixed_factors' considers NA as one level while evaluating the number of factor levels. How can I fix that so that for A==Y, B has one level (A), not two (A, NA)?
df <- tibble::tribble(
~A, ~B,
"X", "A",
"X", "B",
"Y", "A",
"Y", NA_character_,
"Z", "A",
"Z", "B",
"Z", NA_character_,
"K", "A",
"K", "A",
"L", NA_character_,
"L", NA_character_,
)
df$B <- as.factor(df$B)
dfgrp <- df %>%
group_by(A) %>%
nest()
drop_fixed_factors <- function(x) {
x %>% discard(~is.factor(.x) & length(unique(.x))<2)
}
dfgrp1 <- dfgrp %>%
mutate(data_1 = map(data, ~drop_fixed_factors(.x)))
dfgrp1
dfgrp1$data_1[[2]]
The desired output should not have variable B for the group A == "Y".
You could manually remove the NA
values within unique
:
drop_fixed_factors <- function(x) {
x %>% discard(~is.factor(.x) & length(unique(na.omit(.x)))<2)
}
Alternatively you could use dplyr::n_distinct
and use the na.rm
argument:
drop_fixed_factors <- function(x) {
x %>% discard(~is.factor(.x) & n_distinct(.x, na.rm = TRUE)<2)
}
Both options return nothing for group "Y".
dfgrp1
# A tibble: 5 x 3
A data data_1
<chr> <list> <list>
1 X <tibble [2 x 1]> <tibble [2 x 1]>
2 Y <tibble [2 x 1]> <tibble [2 x 0]>
3 Z <tibble [3 x 1]> <tibble [3 x 1]>
4 K <tibble [2 x 1]> <tibble [2 x 0]>
5 L <tibble [2 x 1]> <tibble [2 x 0]>