Search code examples
rdplyrtidyversestringracross

How to use filter across and str_detect together to filter conditional on mutlitple columns


I have this dataframe:

df <- structure(list(col1 = c("Z2", "A2", "B2", "C2", "A2", "E2", "F2", 
"G2"), col2 = c("Z2", "Z2", "A2", "B2", "C2", "D2", "A2", "F2"
), col3 = c("A2", "B2", "C2", "D2", "E2", "F2", "G2", "Z2")), class = "data.frame", row.names = c(NA, -8L))

> df
  col1 col2 col3
1   Z2   Z2   A2
2   A2   Z2   B2
3   B2   A2   C2
4   C2   B2   D2
5   A2   C2   E2
6   E2   D2   F2
7   F2   A2   G2
8   G2   F2   Z2

I would like to use explicitly filter, across and str_detect in a tidyverse setting to filter all rows that start with an A over col1:col3.

Expected result:

  col1 col2 col3
1   Z2   Z2   A2
2   A2   Z2   B2
3   B2   A2   C2
4   A2   C2   E2
5   F2   A2   G2

I have tried:

library(dplyr)
library(stringr)
df %>% 
    filter(across(c(col1, col2, col3), ~str_detect(., "^A")))

This gives:

[1] col1 col2 col3
<0 Zeilen> (oder row.names mit Länge 0)

I want to learn why this code is not working using filter, across and str_detect!


Solution

  • We can use if_any as across will look for & condition i.e. all columns should meet the condition for a particular row to get filtered

    library(dplyr)
    library(stringr)
    df %>% 
        filter(if_any(everything(), ~str_detect(., "^A"))) 
    

    -output

       col1 col2 col3
    1   Z2   Z2   A2
    2   A2   Z2   B2
    3   B2   A2   C2
    4   A2   C2   E2
    5   F2   A2   G2
    

    According to ?across

    if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns.

    across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

    The if_any/if_all are not part of the scoped variants