Search code examples
rlistfiltertidyversense

Filter dataframe when column name-value pairs are stored in a list?


I have a dataframe like:

df <- tibble::rownames_to_column(USArrests, "State") %>% 
  tidyr::pivot_longer(cols = -State)

head(df)
# A tibble: 6 x 3
  State   name     value
  <chr>   <chr>    <dbl>
1 Alabama Murder    13.2
2 Alabama Assault  236  
3 Alabama UrbanPop  58  
4 Alabama Rape      21.2
5 Alaska  Murder    10  
6 Alaska  Assault  263  

In a separate list object l I have columns and that I need to remove from the dataframe. The element names are the column names and the values correspond to rows I want to remove:

l <- list(State = c("Alabama", "Pennsylvania", "Texas"),
          name = c("Murder", "Assault"))

Hardcoded it would do this:

dplyr::filter(df, !State %in% c("Alabama", "Pennsylvania", "Texas"), !name %in% c("Murder", "Assault"))

   State      name     value
   <chr>      <chr>    <dbl>
 1 Alaska     UrbanPop  48  
 2 Alaska     Rape      44.5
 3 Arizona    UrbanPop  80  
 4 Arizona    Rape      31  
 5 Arkansas   UrbanPop  50  
 6 Arkansas   Rape      19.5
 7 California UrbanPop  91  
 8 California Rape      40.6
 9 Colorado   UrbanPop  78  
10 Colorado   Rape      38.7
# ... with 84 more rows

However, l changes often so I cannot/don't want to hardcode. I attempted the following, but only the last expression is being evaluated:

library(purrr)
filter_expr <- imap_chr(l, ~ paste0("! ", 
                     .y, 
                     " %in% c(\"", 
                     paste(.x, collapse = "\",\""), 
                     "\")")) %>% parse(text = .)

filter(df, eval(filter_expr))

   State      name     value
   <chr>      <chr>    <dbl>
 1 Alabama    UrbanPop  58  
 2 Alabama    Rape      21.2
 3 Alaska     UrbanPop  48  
 4 Alaska     Rape      44.5
 5 Arizona    UrbanPop  80  
 6 Arizona    Rape      31  
 7 Arkansas   UrbanPop  50  
 8 Arkansas   Rape      19.5
 9 California UrbanPop  91  
10 California Rape      40.6
# ... with 90 more rows

Is there a way to filter df when the filter criteria are stored in a structure like l that is more idiomatic to the tidyverse?

I considered this SO answer, however, the expressions are not dynamic.


Solution

  • We could use across in filter looping over the names of 'l', created the logical expression by subsetting the 'l' using the key from column name (cur_column()) and negate (!). Note that cur_column() works currently only with across and not if_all/if_any (dplyr -1.0.6 on R 4.1.0)

    library(dplyr)
    df %>% 
       filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))
    

    -output

    # A tibble: 94 x 3
    #   State      name     value
    #   <chr>      <chr>    <dbl>
    # 1 Alaska     UrbanPop  48  
    # 2 Alaska     Rape      44.5
    # 3 Arizona    UrbanPop  80  
    # 4 Arizona    Rape      31  
    # 5 Arkansas   UrbanPop  50  
    # 6 Arkansas   Rape      19.5
    # 7 California UrbanPop  91  
    # 8 California Rape      40.6
    # 9 Colorado   UrbanPop  78  
    #10 Colorado   Rape      38.7
    # … with 84 more rows
    

    We could make use of if_all if we can set an attribute

    library(magrittr)
    df %>% 
      mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>% 
      filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))
    

    Or with imap/reduce

    library(purrr)
    df %>%
        filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
                     reduce(`&`))
    

    Or another option is anti_join

    for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))