I have this data:
# A tibble: 20 x 6
ID style param1 param2 param3 param4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 ar R78 NA NA NA
2 2 bg NA NA NA NA
3 3 bh NA NA NA NA
4 4 ar NA R78 NA NA
5 5 bg NA NA NA NA
6 6 bh NA NA NA NA
7 7 ar R78 NA NA NA
8 8 bg NA NA R78 NA
9 9 bh NA NA NA NA
10 10 ar NA R78 NA NA
11 11 bg NA NA NA NA
12 12 bh NA NA R78 NA
13 13 ar NA NA NA NA
14 14 bg R78 NA NA NA
15 15 bh NA NA NA NA
16 16 ar NA NA NA NA
17 17 bg NA NA NA NA
18 18 bh R78 NA NA NA
19 19 ar NA NA NA R78
20 20 bg NA NA NA NA
I want to use dplyr::filter
to select rows when R78 is in the column param1, param2, param3 or param4
I try:
data %>%
filter(across(param1:param4) == "R78")
which returns me:
# A tibble: 4 x 6
ID style param1 param2 param3 param4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 ar R78 NA NA NA
2 7 ar R78 NA NA NA
3 14 bg R78 NA NA NA
4 18 bh R78 NA NA NA
This is the same as when i do data %>% filter(param1 == "R78")
...
Maybe i misuse the "across" function. I've tried with multiples "|" but never work :/
What i expect to my code is it must return me a tibble with the row 1, 4, 7, 10, 12, 14; 18 and 19 only :/
Thnaks to you !
across
works column-wise. In such cases I think it is better to use filter_at
:
library(dplyr)
df %>% filter_at(vars(param1:param4), any_vars(. == 'R78'))
# ID style param1 param2 param3 param4
#1 1 ar R78 <NA> <NA> <NA>
#4 4 ar <NA> R78 <NA> <NA>
#7 7 ar R78 <NA> <NA> <NA>
#8 8 bg <NA> <NA> R78 <NA>
#10 10 ar <NA> R78 <NA> <NA>
#12 12 bh <NA> <NA> R78 <NA>
#14 14 bg R78 <NA> <NA> <NA>
#18 18 bh R78 <NA> <NA> <NA>
#19 19 ar <NA> <NA> <NA> R78
A hack to make across
work is to use Reduce
:
df %>% filter(Reduce(`|`, across(param1:param4, ~. == 'R78')))
In base R, you can use rowSums
:
cols <- paste0('param', 1:4)
df[rowSums(df[cols] == 'R78', na.rm = TRUE) > 0, ]