I want to filter a nested dataframe for specific combinations of variables using a named list and I can't manage to exclude some unwanted combinations. Here's an example :
library(tidyverse)
# Create fake data
set.seed(1234)
data <- tibble(
c1 = rep(letters[1:3], each = 10),
c2 = sample(letters[4:6], size = 30, replace = T),
var1 = rnorm(30),
var2 = rnorm(30)
)
nested_data <- data %>%
nest(.by = c(c1, c2))
# Create list of the specific combinations I want
criteria <- list(a = c("d", "e"), b = "d")
I tried to use the functions names()
and unique()
to do it but the result does not exclude unwanted combinations with overlapping criterias.
# Filter for the specific combinations
c1_criteria <- names(criteria)
c2_criteria <- unique(unlist(criteria))
nested_data %>%
filter(c1 %in% c1_criteria,
c2 %in% c2_criteria) %>%
unnest(data)
Here's the output
# A tibble: 4 × 3
c1 c2 data
<chr> <chr> <list>
1 a e <tibble [5 × 2]>
2 a d <tibble [3 × 2]>
3 b e <tibble [6 × 2]>
4 b d <tibble [1 × 2]>
I intended to have only the following combinations :
c1 == "a" & c2 == "d"
,
c1 == "a" & c2 == "e"
,
c1 == "b" & c2 == "d
"
However, the output also includes the combination, c1 == "b" & c2 == "e"
. So the intended output is the following :
# A tibble: 3 × 3
c1 c2 data
<chr> <chr> <list>
1 a e <tibble [5 × 2]>
2 a d <tibble [3 × 2]>
3 b d <tibble [1 × 2]>
I think there might be a way to generate a list of specific logical conditions from the named list criterias
and supply it as argument to the filter function but I'm not sure how to do it.
We could turn that list into a 2-column filtering table that holds desired combinations of c1 & c2, then it can be used with semi_join()
:
library(tidyverse)
set.seed(1234)
data <- tibble(
c1 = rep(letters[1:3], each = 10),
c2 = sample(letters[4:6], size = 30, replace = T),
var1 = rnorm(30),
var2 = rnorm(30)
)
nested_data <- data %>%
nest(.by = c(c1, c2))
criteria <- list(a = c("d", "e"), b = "d")
enframe(criteria, "c1", "c2") %>%
unnest(c2) %>%
semi_join(nested_data, .)
#> Joining with `by = join_by(c1, c2)`
#> # A tibble: 3 × 3
#> c1 c2 data
#> <chr> <chr> <list>
#> 1 a e <tibble [5 × 2]>
#> 2 a d <tibble [3 × 2]>
#> 3 b d <tibble [1 × 2]>
List transformation steps:
enframe(criteria, name = "c1", value = "c2")
#> # A tibble: 2 × 2
#> c1 c2
#> <chr> <list>
#> 1 a <chr [2]>
#> 2 b <chr [1]>
enframe(criteria, "c1", "c2") %>% unnest(c2)
#> # A tibble: 3 × 2
#> c1 c2
#> <chr> <chr>
#> 1 a d
#> 2 a e
#> 3 b d
Created on 2023-10-02 with reprex v2.0.2