Search code examples
rdplyrnse

Filter based on several columns with similar naming structure


I would like to filter my tibble on several columns with similar names. Specificall I'd like to compare x with x_new, y with y_new and so on but without specifying the explicit name, but by using the structure in columns names.

I tried to use filter_at, but this is not working as I don't know how to evaluate the formula in the last line properly.

my_df %>% 
  filter_at(vars(contains("_new")), any_vars(funs({
    x <- .
    x_name <- quo_name(quo(x))
    x_new_name <- str_replace(x_name, "_new", "")
    paste(x_name, "!=", x_new_name)
  })
  ))

Data

my_df <- tibble(x = 1:5, 
                x_new = c(1:4, 1), 
                y = letters[1:5], 
                y_new = c(letters[1:3], "a", "e"))

# A tibble: 5 x 4
#       x x_new y     y_new
#   <int> <dbl> <chr> <chr>
# 1     1    1. a     a    
# 2     2    2. b     b    
# 3     3    3. c     c    
# 4     4    4. d     a    
# 5     5    1. e     e   

Expected output

# A tibble: 2 x 4
#       x x_new y     y_new
#   <int> <dbl> <chr> <chr>
# 1     4    4. d     a    
# 2     5    1. e     e 

Solution

  • We could do this with map. Create a vector of unique names by removing the suffix part of the column names ('nm1'). Loop through the 'nm1', select the columns that matches the column name, reduce it to a single logical vector by checking whether the rows are not equal, then reduce the list of logical vectors to a single logical vector and extract the rows based on that

    library(tidyverse)
    nm1 <- unique(sub("_.*", "", names(my_df)))
    map(nm1, ~ my_df %>% 
                    select_at(vars(matches(.x))) %>% 
                    reduce(`!=`)) %>% 
           reduce(`|`) %>% 
           magrittr::extract(my_df, ., )
    #    x x_new y     y_new
    #  <int> <dbl> <chr> <chr>
    #1     4     4 d     a    
    #2     5     1 e     e    
    

    Another option is to create an expression and then evaluate

    library(rlang)
    nm1 <- names(my_df) %>% 
              split(sub("_.*", "", .)) %>%
              map(~ paste(.x, collapse=" != ") %>%
                       paste0("(", ., ")")) %>%
              reduce(paste, sep = "|")
    my_df %>%
       filter(!! parse_expr(nm1))
    # A tibble: 2 x 4
    #     x x_new y     y_new
    # <int> <dbl> <chr> <chr>
    #1    4     4 d     a    
    #2    5     1 e     e