Search code examples
rdplyrtidyrgrepl

Retain select columns and filter the rest based on string


I have a dataset where I want to filter out by a specific string but retain two columns in addition to those filtered.

For example.

help <- data.frame(
data  = c(type, 100, 100, 110, 110, 110),
user1 = c("red", "yes", "no", "yes", "no", "yes"),
user2 = c("blue", "yes", "no", "yes", "no", "yes"),
user3 = c("red", "yes", "no", "yes", "no", "yes"),
user4 = c("blue", "yes", "no", "yes", "no", "yes"),
more_data = c(5, 3, 6, 3, 4, 3))

I'm hoping to filter out the users with the color "red" in the first row of their data but also retain data and more_data.

For example, my end dataset would look like this:

  data user1 user3 more_data
1  type  red  red   5
2  100   yes  yes   3
3  100    no  no    6
4  110   yes  yes   3
5  110    no  no    4
6  110   yes  yes   3

Is this some sort of filter + grepl command where I filter the reverse of blue? filter(help, grepl(!"blue", help)) but that doesn't work.


Solution

  • We can use select with where to check for any 'red' value or in the first element

    library(dplyr)
    help %>% 
       select(data, where(~ 'red' %in% first(.)), more_data)
    

    -output

    #  data user1 user3 more_data
    #1 type   red   red         5
    #2  100   yes   yes         3
    #3  100    no    no         6
    #4  110   yes   yes         3
    #5  110    no    no         4
    #6  110   yes   yes         3
    

    data

    help <- structure(list(data = c("type", "100", "100", "110", "110", "110"
    ), user1 = c("red", "yes", "no", "yes", "no", "yes"), user2 = c("blue", 
    "yes", "no", "yes", "no", "yes"), user3 = c("red", "yes", "no", 
    "yes", "no", "yes"), user4 = c("blue", "yes", "no", "yes", "no", 
    "yes"), more_data = c(5, 3, 6, 3, 4, 3)), class = "data.frame",
    row.names = c(NA, 
    -6L))