Search code examples
rdplyrduplicatesaggregatedata-transform

r retain duplicates after group by not min value


I have a dataset like this.

   ID    Group    Value    Col3
   1     z1       1.29     1
   1     z1       0.81     1
   2     z2       2.89     1
   2     z2       1.53     2
   3     z1       0.13     3
   3     z1       0.97     3
   4     z3       10.75    3
   4     z3       8.13     2
   5     x2       0.45     1
   5     x2       1.43     3

How do I retain rows where Col3=2 when duplicates are identified based on group_by(Id,Group)

Expected results

  ID    Group    Value    Col3
   1     z1       1.29     1
   1     z1       0.81     1

   2     z2       1.53     2

   3     z1       0.13     3
   3     z1       0.97     3
  
   4     z3       8.13     2

   5     x2       0.45     1
   5     x2       1.43     3

Please note that one row in ID 2 and ID 4 are exclude and only rows where Col3=2 is retained. Thanks in advance for any help.


Solution

  • It seems if there is a 2 in a group you want to keep just that, otherwise return all rows. This seems to do the trick

    dd %>% 
      group_by(ID, Group) %>% 
      filter(!any(Col3==2)  | (any(Col3==2) & Col3==2))
    

    which returns

         ID Group Value  Col3
      <int> <chr> <dbl> <int>
    1     1 z1     1.29     1
    2     1 z1     0.81     1
    3     2 z2     1.53     2
    4     3 z1     0.13     3
    5     3 z1     0.97     3
    6     4 z3     8.13     2
    7     5 x2     0.45     1
    8     5 x2     1.43     3