Search code examples
rdplyr

dplyr identify rows that do not have a matched column within group


I have a df that looks as follow:

GROUP    NAME     TITLE     LEVEL
  A      John      Lead       0
  A      John      Staff      1
  A      Jake      Jr         0
  B      Bob       Lead       1
  B      Bob       Lead       0
  C      Andrew    Jr         0
  C      Andrew    Jr         1
  C      Rebecca   Staff      0

What I am trying to do is identify names within each group that have a title in level 1 that they do not have for level 0. In this example I would be looking for:

GROUP    NAME     TITLE     LEVEL
  A      John      Staff      1

Because in group A John has a title of staff for level 1 but not for level 0.


Solution

  • The simplest approach I can think of would be an anti-join, where we look for GROUP/NAME/TITLE in Level 1 that is not in Level 0.

    anti_join(df |> filter(LEVEL == 1),
              df |> filter(LEVEL == 0),
              join_by(GROUP, NAME, TITLE))
    

    Result

    # A tibble: 1 × 4
      GROUP NAME  TITLE LEVEL
      <chr> <chr> <chr> <dbl>
    1 A     John  Staff     1