Search code examples
rfilterdplyranyacross

What is the correct way to use any(), all(), etc., with the dplyr::filter() + dplyr::across() combination?


Say I have the following data.frame df:

#         col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello        10          3
# 2 WHAT_hello  WHAT_hello WHAT_hello         1          2
# 3      Hello       Hello      Hello         9          1

I would like to process the data.frame to only retain those rows that contain the prefix WHAT_ in at least one of col1, col2, or col3.

Now I know that I can do this easily with |, but I was trying to achieve this using dplyr::across and tidyselect::matches along with base::any and stringr::str_detect to point dplyr::filter at the right columns. But this doesn't seem to work, even when used in conjunction with dplyr::rowwise.

So what is the correct way to go about this here? What am I doing wrong?

I would like to use across + any primarily because I might not necessarily in advance know how many of these columns I'd have in the actual dataset.

Here's my example (data + code) below:

#Libraries.
library(base)
library(dplyr)
library(tidyselect)
library(stringr)
library(magrittr)



#Toy data.
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"), 
                 col2 = c("WHAT_hello2", "WHAT_hello", "Hello"), 
                 col3 = c("Hello", "WHAT_hello", "Hello"),
                 othercol1 = sample(1:10, 3), 
                 othercol11 = sample(1:10, 3), 
                 stringsAsFactors = FALSE)



#Works.
df %>% 
  filter(str_detect(col1, "^WHAT_") | str_detect(col2, "^WHAT_") | str_detect(col3, "^WHAT_"))

#Output.
# col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello         1          2
# 2 WHAT_hello  WHAT_hello WHAT_hello         5          4


#Works (incorrectly).
df %>% 
  filter(
    across(.cols = matches("^col"), 
           .fns = ~ any(str_detect(.x, "^WHAT")) )
  )

#Output.
# col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello         1          2
# 2 WHAT_hello  WHAT_hello WHAT_hello         5          4
# 3      Hello       Hello      Hello         4          7



#Works (incorrectly) also.
df %>% 
  rowwise() %>%
  filter(
    across(.cols = matches("^col"), 
           .fns = ~ any(str_detect(.x, "^WHAT")) )
  )

#Output.
#   col1       col2       col3       othercol1 othercol11
#   <chr>      <chr>      <chr>          <int>      <int>
# 1 WHAT_hello WHAT_hello WHAT_hello         5          4

Solution

  • For functions applying to rows rather than columns you can use c_across with rowwise:

    df %>% 
      rowwise() %>% 
      filter(any(str_detect(c_across(matches('^col')), '^WHAT')))
    
    # # A tibble: 2 x 5
    # # Rowwise: 
    #   col1       col2        col3       othercol1 othercol11
    #   <chr>      <chr>       <chr>          <int>      <int>
    # 1 Hello      WHAT_hello2 Hello              9          7
    # 2 WHAT_hello WHAT_hello  WHAT_hello         3         10
    

    Or, using across with rowSums:

    row_lgl <- 
      df %>% 
        transmute(across(.cols = matches("^col"), .fns = ~ str_detect(.x, "^WHAT"))) %>% 
        rowSums %>% 
        '>'(0)
               
    df %>% 
      filter(row_lgl)
    #         col1        col2       col3 othercol1 othercol11
    # 1      Hello WHAT_hello2      Hello         9          7
    # 2 WHAT_hello  WHAT_hello WHAT_hello         3         10