Search code examples
rdplyrtidyverse

Remove rows with conditions using dplyr


I'd like to remove rows in my data frame that looks like

df <- data.frame(col1 = c("a", "a", "m", "m", "m", "m", "n", "q"),
                 col2 = c("a", "b", "m", "x", "y", "z", NA, "p"))
  col1 col2
1    a    a
2    a    b
3    m    m
4    m    x
5    m    y
6    m    z
7    n <NA>
8    q    p

I'm only focusing on a and m in Col1 because those values appear in Col2. I would like to remove rows where Col1 and Col2 don't have matching values. Note: Given that the provided df is just a reproducible example for my huge dataset, specifying individual values like 'a' or 'm' wouldn't be suitable.

My desired outcome

  col1 col2
1    a    a
2    m    m
3    n   <NA>
4    q    p

Any suggestions? Thanks a lot for your help!


Solution

  • You can try this

    df %>%
        filter(col1 == col2 | !col1 %in% intersect(col1, col2))
    

    which gives

      col1 col2
    1    a    a
    2    m    m
    3    n <NA>
    4    q    p