Say I have the following data.frame
df
:
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 10 3
# 2 WHAT_hello WHAT_hello WHAT_hello 1 2
# 3 Hello Hello Hello 9 1
I would like to process the data.frame
to only retain those rows that contain the prefix WHAT_
in at least one of col1
, col2
, or col3
.
Now I know that I can do this easily with |
, but I was trying to achieve this using dplyr::across
and tidyselect::matches
along with base::any
and stringr::str_detect
to point dplyr::filter
at the right columns. But this doesn't seem to work, even when used in conjunction with dplyr::rowwise
.
So what is the correct way to go about this here? What am I doing wrong?
I would like to use across
+ any
primarily because I might not necessarily in advance know how many of these columns I'd have in the actual dataset.
Here's my example (data + code) below:
#Libraries.
library(base)
library(dplyr)
library(tidyselect)
library(stringr)
library(magrittr)
#Toy data.
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
#Works.
df %>%
filter(str_detect(col1, "^WHAT_") | str_detect(col2, "^WHAT_") | str_detect(col3, "^WHAT_"))
#Output.
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 1 2
# 2 WHAT_hello WHAT_hello WHAT_hello 5 4
#Works (incorrectly).
df %>%
filter(
across(.cols = matches("^col"),
.fns = ~ any(str_detect(.x, "^WHAT")) )
)
#Output.
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 1 2
# 2 WHAT_hello WHAT_hello WHAT_hello 5 4
# 3 Hello Hello Hello 4 7
#Works (incorrectly) also.
df %>%
rowwise() %>%
filter(
across(.cols = matches("^col"),
.fns = ~ any(str_detect(.x, "^WHAT")) )
)
#Output.
# col1 col2 col3 othercol1 othercol11
# <chr> <chr> <chr> <int> <int>
# 1 WHAT_hello WHAT_hello WHAT_hello 5 4
For functions applying to rows rather than columns you can use c_across
with rowwise
:
df %>%
rowwise() %>%
filter(any(str_detect(c_across(matches('^col')), '^WHAT')))
# # A tibble: 2 x 5
# # Rowwise:
# col1 col2 col3 othercol1 othercol11
# <chr> <chr> <chr> <int> <int>
# 1 Hello WHAT_hello2 Hello 9 7
# 2 WHAT_hello WHAT_hello WHAT_hello 3 10
Or, using across
with rowSums
:
row_lgl <-
df %>%
transmute(across(.cols = matches("^col"), .fns = ~ str_detect(.x, "^WHAT"))) %>%
rowSums %>%
'>'(0)
df %>%
filter(row_lgl)
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 9 7
# 2 WHAT_hello WHAT_hello WHAT_hello 3 10