Search code examples
rif-statementstring-matching

Matching values in multiple columns in R based on condition


Say I have a datafame df

resident    faculty    submittedBy    match    caseID    phase

george      sally      george         1        george_1  pre
george      sally      sally          0        george_1  pre
george      sally      george         1        george_1  intra
jane        carl       jane           1        jane_1    pre
jane        carl       carl           0        jane_1    pre
jane        carl       carl           0        jane_1    intra

and I want to add a column df$response to this dataframe according to the following parameters (I'm thinking I need a set of nested ifelses, but I'm struggling to execute it correctly):

For a given row X, if df$match = 1,

print "1" in df$response if:

any row in df$match where df$match = 0 has the same contents in df$caseID, df$faculty, and df$phase as row X. Otherwise print "0".

So the output should be this:

response

1
0
0
1
0
0

because only the first and fourth rows contain values for which there are matches in df$caseID, df$faculty, and df$phase for both a row where df$match = 1 and a row where df$match = 0.


Solution

  • Here is how I'd do it

    # read the data
    test <- read.table(text = 'resident    faculty    submittedBy    match    caseID    phase
                       george      sally      george         1        george_1  pre
                       george      sally      sally          0        george_1  pre
                       george      sally      george         1        george_1  intra
                       jane        carl       jane           1        jane_1    pre
                       jane        carl       carl           0        jane_1    pre
                       jane        carl       carl           0        jane_1    intra', header=T)
    
    # create the response
    resp <- logical(0)
    
    # iterate over each loop
    for (rr in 1:nrow(test)){
      if (test$match[rr] == 0){
        resp[rr] <- 0
      }
      else{
        tmp <- rbind(test[-rr, c('faculty', 'caseID', 'phase')],  # add the onto the end
                     test[rr, c('faculty', 'caseID', 'phase')])   # test if line is duplicated
        resp[rr] <- ifelse(duplicated(tmp)[nrow(tmp)], 1, 0)
      }
    }