Search code examples
rconditional-statementssubset

How to subset data in R: participant only needs to meet one of five criteria?


I'm having a lot of trouble figuring out how to subset a data set in R despite reading through many pages here. The set contains information from over 3000 participants. Each participant was asked about five different health conditions and gave binary answers (i.e., yes/no diabetes; yes/no obesity, etc.). How do I make a subset that includes people who have only ONE of the conditions? For instance, everyone in this new subset would have either obesity or diabetes or high cholesterol, but none would have two or more conditions.

Thank you!!

ETA: After a night's sleep, I looked at everything (and the comments) again. Here's some clarification and what I've done since.

Sample data (mydata) (0 = no, 1 = yes)

Participant  HighCho  Diabetes  Obesity
1              1        1        0
2              0        1        1
3              1        0        0
4              0        0        0
5              0        1        0     

I want my subset outcome to include only those with none of the three conditions or only one of the three:

Participant  HighCho  Diabetes  Obesity 
3             1        0        0
4             0        0        0
5             0        1        0 

I've written:

new.data <- subset(mydata = (HighCho == 0 & Diabetes == 0 & Obesity==0) | HighCho == 1 | Diabetes == 1 | Obesity == 1)

My problem is that even though I capture everyone who is free from all conditions, I still include people who have more than one condition. I thought with my "or" statement, I would only include those with only one of the three conditions (rather than two). Any insights as to what I might be doing incorrectly?


Solution

  • You can use the apply function to sum the number of conditions each participant has.

    mydata[apply(mydata[, c('HighCho', 'Diabetes', 'Obesity')], 1, sum) %in% 0:1, ]