I have a dataframe like:
df <- tibble::rownames_to_column(USArrests, "State") %>%
tidyr::pivot_longer(cols = -State)
head(df)
# A tibble: 6 x 3
State name value
<chr> <chr> <dbl>
1 Alabama Murder 13.2
2 Alabama Assault 236
3 Alabama UrbanPop 58
4 Alabama Rape 21.2
5 Alaska Murder 10
6 Alaska Assault 263
In a separate list object l
I have columns and that I need to remove from the dataframe. The element names are the column names and the values correspond to rows I want to remove:
l <- list(State = c("Alabama", "Pennsylvania", "Texas"),
name = c("Murder", "Assault"))
Hardcoded it would do this:
dplyr::filter(df, !State %in% c("Alabama", "Pennsylvania", "Texas"), !name %in% c("Murder", "Assault"))
State name value
<chr> <chr> <dbl>
1 Alaska UrbanPop 48
2 Alaska Rape 44.5
3 Arizona UrbanPop 80
4 Arizona Rape 31
5 Arkansas UrbanPop 50
6 Arkansas Rape 19.5
7 California UrbanPop 91
8 California Rape 40.6
9 Colorado UrbanPop 78
10 Colorado Rape 38.7
# ... with 84 more rows
However, l
changes often so I cannot/don't want to hardcode. I attempted the following, but only the last expression is being evaluated:
library(purrr)
filter_expr <- imap_chr(l, ~ paste0("! ",
.y,
" %in% c(\"",
paste(.x, collapse = "\",\""),
"\")")) %>% parse(text = .)
filter(df, eval(filter_expr))
State name value
<chr> <chr> <dbl>
1 Alabama UrbanPop 58
2 Alabama Rape 21.2
3 Alaska UrbanPop 48
4 Alaska Rape 44.5
5 Arizona UrbanPop 80
6 Arizona Rape 31
7 Arkansas UrbanPop 50
8 Arkansas Rape 19.5
9 California UrbanPop 91
10 California Rape 40.6
# ... with 90 more rows
Is there a way to filter df
when the filter criteria are stored in a structure like l
that is more idiomatic to the tidyverse?
I considered this SO answer, however, the expressions are not dynamic.
We could use across
in filter
looping over the names
of 'l', created the logical expression by subsetting the 'l' using the key from column name (cur_column()
) and negate (!
). Note that cur_column()
works currently only with across
and not if_all/if_any
(dplyr
-1.0.6
on R 4.1.0
)
library(dplyr)
df %>%
filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))
-output
# A tibble: 94 x 3
# State name value
# <chr> <chr> <dbl>
# 1 Alaska UrbanPop 48
# 2 Alaska Rape 44.5
# 3 Arizona UrbanPop 80
# 4 Arizona Rape 31
# 5 Arkansas UrbanPop 50
# 6 Arkansas Rape 19.5
# 7 California UrbanPop 91
# 8 California Rape 40.6
# 9 Colorado UrbanPop 78
#10 Colorado Rape 38.7
# … with 84 more rows
We could make use of if_all
if we can set an attribute
library(magrittr)
df %>%
mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>%
filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))
Or with imap/reduce
library(purrr)
df %>%
filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
reduce(`&`))
Or another option is anti_join
for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))