Search code examples
rdplyrfiltertidyverse

Using R's tidyverse, what is the most efficient way to filter out data that meet conditions across multiple columns?


I have a dataset where I want to filter out if a person's favorite color is green and their favorite food is sushi. However, if the person just meets one of those criteria, I want to retain them. With that context, how can I most efficiently transform this dataset:

library(dplyr)

test <- tibble(person = c("Justin", "Corey", "Kate", "Sibley"),
               fav_food = c("sushi", "sushi", "cake", "tomatos"),
               fav_color = c("green", "red", "green", "blue"))

to this dataset?

library(dplyr)
answer <- tibble(person = c("Corey", "Kate", "Sibley"),
               fav_food = c("sushi", "cake", "tomatos"),
               fav_color = c("red", "green", "blue"))

My current solution is to make a new variable that is the combination of those two columns, but I feel as if there must be a more straightforward solution than this:

library(dplyr)

#code works but curious if there is a more straightforward approach

test %>%
  mutate(food_color = paste(fav_food, fav_color, sep = "-")) %>%
  filter(food_color != "sushi-green")

Solution

  • Indicate your conditions separated by & and use ! to keep rows that don't meet this criteria:

    test %>% 
      filter(!(fav_food == "sushi" & fav_color == "green"))
    

    output

    # A tibble: 3 × 3
      person fav_food fav_color
      <chr>  <chr>    <chr>    
    1 Corey  sushi    red      
    2 Kate   cake     green    
    3 Sibley tomatos  blue