Here is some sample data
library(tidyverse)
data <- matrix(runif(20), ncol = 4)
colnames(data) <- c("mt100", "cp001", "cp002", "cp003")
data <- as_tibble(data)
The real data set has many more columns but it stands that there are many columns that all start with "cp". In dplyr
I can select all these columns
data %>%
select(starts_with("cp"))
Is there a way in which I can use the starts_with
(or similar function) to filter by multiple columns without having to explicitly write them all? I'm thinking something like this
data %>%
filter(starts_with("cp") > 0.2)
We could use if_all
or if_any
as Anil is pointing in his comments: For your code this would be:
https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/
if_any() and if_all()
"across() is very useful within summarise() and mutate(), but it’s hard to use it with filter() because it is not clear how the results would be combined into one logical vector. So to fill the gap, we’re introducing two new functions if_all() and if_any()."
if_all
data %>%
filter(if_all(starts_with("cp"), ~ . > 0.2))
mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.688 0.402 0.467 0.646
2 0.663 0.757 0.728 0.335
3 0.472 0.533 0.717 0.638
if_any:
data %>%
filter(if_any(starts_with("cp"), ~ . > 0.2))
mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.554 0.970 0.874 0.187
2 0.688 0.402 0.467 0.646
3 0.658 0.850 0.00813 0.542
4 0.663 0.757 0.728 0.335
5 0.472 0.533 0.717 0.638