Search code examples
rdataframeselectdplyrstartswith

Removing columns which start with XXX and meet a criteria


Objective: Remove columns if their name starts with XXX and the rows meet a criteria. For example, in the dataset below, remove all "fillerX" columns which only contain zeros.

Data:

iris %>% 
    tibble() %>% 
    slice(1:5) %>% 
    mutate(
        fillerQ = rep(0,5),
        fillerW = rep(0,5),
        fillerE = rep(0,5),
        fillerR = c(0,0,1,0,0),
        fillerT = rep(0,5),
        fillerY = rep(0,5),
        fillerU = rep(0,5),
        fillerI = c(0,0,0,0,1),
        fillerO = rep(0,5),
        fillerP = rep(0,5),
    )
# A tibble: 5 × 15
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerQ fillerW fillerE fillerR fillerT fillerY fillerU fillerI fillerO fillerP
         <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1          5.1         3.5          1.4         0.2 setosa        0       0       0       0       0       0       0       0       0       0
2          4.9         3            1.4         0.2 setosa        0       0       0       0       0       0       0       0       0       0
3          4.7         3.2          1.3         0.2 setosa        0       0       0       1       0       0       0       0       0       0
4          4.6         3.1          1.5         0.2 setosa        0       0       0       0       0       0       0       0       0       0
5          5           3.6          1.4         0.2 setosa        0       0       0       0       0       0       0       1       0       0

Problem: We can use starts_with("filler") to reference the filler columns, and we can use select_if(~ sum(abs(.)) != 0) to keep non-zero columns, but we cannot put starts_with() inside of select_if(), since we will get the error:

Error:
! `starts_with()` must be used within a *selecting* function.
ℹ See ?tidyselect::faq-selection-context for details.
Run `rlang::last_trace()` to see where the error occurred.

Question: How do you combine starts_with() and select_if()?


Solution

  • select_if() has been superseded. Use where() inside select() instead.

    library(dplyr)
    
    df %>%
      select(!(starts_with("filler") & where(~ all(.x == 0))))
    
    # # A tibble: 5 × 7
    #   Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerR fillerI
    #          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>   <dbl>
    # 1          5.1         3.5          1.4         0.2 setosa        0       0
    # 2          4.9         3            1.4         0.2 setosa        0       0
    # 3          4.7         3.2          1.3         0.2 setosa        1       0
    # 4          4.6         3.1          1.5         0.2 setosa        0       0
    # 5          5           3.6          1.4         0.2 setosa        0       1