Search code examples
rdplyrfiltertidyverseplyr

How to filter nested data


How can I filter a nested dataset (make sure the nest is the exact same as some reference vector or tibble)?

library(tidyverse)

rev_vec <-  c("apple", "pear", "banana")

df <- tibble(
  ID= rep(1:3, each =3),
  fruits =  c("apple", "pear", "banana", 
              "Pineapple", "Pineapple", "orange",
              "lime", "pear", NA))

df_vec <- df %>% 
  group_by(ID) %>% 
  summarise(fruits  = list(unique(fruits)))

## This does not work
df_vec %>% 
  filter(fruits == rev_vec)

## This does not work
df_vec %>% 
  filter(unlist(fruits) == rev_vec)

## This does not work
df_vec %>% 
  filter(all(unlist(fruits[[1]]) ==rev_vec))

Basically, I just need to know which ID (in this case 1) matches the reference vector

expected outcome

Only ID 1 matches the rev vec.

df_vec %>%
   filter(....)
# A tibble: 1 x 2
     ID fruits   
  <int> <list>   
1     1 <chr [3]>

Solution

  • df_vec %>% 
        filter(map_lgl(fruits, ~setequal(., rev_vec)))
    
    # A tibble: 1 x 2
         ID fruits   
      <int> <list>   
    1     1 <chr [3]>