I would like to filter
my tibble
on several columns with similar names. Specificall I'd like to compare x
with x_new
, y
with y_new
and so on but without specifying the explicit name, but by using the structure in columns names.
I tried to use filter_at
, but this is not working as I don't know how to evaluate the formula in the last line properly.
my_df %>%
filter_at(vars(contains("_new")), any_vars(funs({
x <- .
x_name <- quo_name(quo(x))
x_new_name <- str_replace(x_name, "_new", "")
paste(x_name, "!=", x_new_name)
})
))
Data
my_df <- tibble(x = 1:5,
x_new = c(1:4, 1),
y = letters[1:5],
y_new = c(letters[1:3], "a", "e"))
# A tibble: 5 x 4
# x x_new y y_new
# <int> <dbl> <chr> <chr>
# 1 1 1. a a
# 2 2 2. b b
# 3 3 3. c c
# 4 4 4. d a
# 5 5 1. e e
Expected output
# A tibble: 2 x 4
# x x_new y y_new
# <int> <dbl> <chr> <chr>
# 1 4 4. d a
# 2 5 1. e e
We could do this with map
. Create a vector of unique
names by removing the suffix part of the column names ('nm1'). Loop through the 'nm1', select the columns that matches
the column name, reduce
it to a single logical vector by checking whether the rows are not equal, then reduce
the list
of logical vectors to a single logical vector and extract
the rows based on that
library(tidyverse)
nm1 <- unique(sub("_.*", "", names(my_df)))
map(nm1, ~ my_df %>%
select_at(vars(matches(.x))) %>%
reduce(`!=`)) %>%
reduce(`|`) %>%
magrittr::extract(my_df, ., )
# x x_new y y_new
# <int> <dbl> <chr> <chr>
#1 4 4 d a
#2 5 1 e e
Another option is to create an expression and then evaluate
library(rlang)
nm1 <- names(my_df) %>%
split(sub("_.*", "", .)) %>%
map(~ paste(.x, collapse=" != ") %>%
paste0("(", ., ")")) %>%
reduce(paste, sep = "|")
my_df %>%
filter(!! parse_expr(nm1))
# A tibble: 2 x 4
# x x_new y y_new
# <int> <dbl> <chr> <chr>
#1 4 4 d a
#2 5 1 e e