Search code examples
rmatchingdelete-row

delete subjects that are not matched R


I have a df like this one :

ID  matching_variable   status
 1     1                 case
 2     1                 control
 3     2                 case
 4     2                 case
 5     3                 control
 6     3                 control
 7     4                 case
 8     4                 control
 9     5                 case
10     6                 control

I would like to keep all my "pairs" of subjects that are matched (that have the same matching variable) and for which there is 1 case and 1 control (such as the pair corresponding to matching variable = 1 or to maching variable = 4)

So, I would like to remove the matched subjects for which there are only cases (such as matching_variable =2) or only controls (such as matching_variable =3) and the subjects that are alone (that have not been matched) (such as the last 2 subjects)

The expected result would be this:

 ID matching_variable   status
  1         1           case
  2         1           control
  7         4           case
  8         4           control

I'm sure it's not too complicated but I have no idea how to go about it...

Thanks in advance for the help


Solution

  • An idea via base R,

    df[as.logical(with(df, ave(status, matching_variable, FUN = function(i)length(unique(i)) > 1))),]
    
      ID matching_variable  status
    1  1                 1    case
    2  2                 1 control
    7  7                 4    case
    8  8                 4 control