Objective: Remove columns if their name starts with XXX and the rows meet a criteria. For example, in the dataset below, remove all "fillerX" columns which only contain zeros.
Data:
iris %>%
tibble() %>%
slice(1:5) %>%
mutate(
fillerQ = rep(0,5),
fillerW = rep(0,5),
fillerE = rep(0,5),
fillerR = c(0,0,1,0,0),
fillerT = rep(0,5),
fillerY = rep(0,5),
fillerU = rep(0,5),
fillerI = c(0,0,0,0,1),
fillerO = rep(0,5),
fillerP = rep(0,5),
)
# A tibble: 5 × 15
Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerQ fillerW fillerE fillerR fillerT fillerY fillerU fillerI fillerO fillerP
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 0 0 0 0 0 0 0 0 0 0
2 4.9 3 1.4 0.2 setosa 0 0 0 0 0 0 0 0 0 0
3 4.7 3.2 1.3 0.2 setosa 0 0 0 1 0 0 0 0 0 0
4 4.6 3.1 1.5 0.2 setosa 0 0 0 0 0 0 0 0 0 0
5 5 3.6 1.4 0.2 setosa 0 0 0 0 0 0 0 1 0 0
Problem: We can use starts_with("filler")
to reference the filler columns, and we can use select_if(~ sum(abs(.)) != 0)
to keep non-zero columns, but we cannot put starts_with()
inside of select_if()
, since we will get the error:
Error:
! `starts_with()` must be used within a *selecting* function.
ℹ See ?tidyselect::faq-selection-context for details.
Run `rlang::last_trace()` to see where the error occurred.
Question: How do you combine starts_with()
and select_if()
?
select_if()
has been superseded. Use where()
inside select()
instead.
library(dplyr)
df %>%
select(!(starts_with("filler") & where(~ all(.x == 0))))
# # A tibble: 5 × 7
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerR fillerI
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0 0
# 2 4.9 3 1.4 0.2 setosa 0 0
# 3 4.7 3.2 1.3 0.2 setosa 1 0
# 4 4.6 3.1 1.5 0.2 setosa 0 0
# 5 5 3.6 1.4 0.2 setosa 0 1