Search code examples
rif-statementcut

Combing multiple variables into a new variable in R


This is probably very simple for someone but I can't seem to make it work for the life of me. I've tried using cut and ifelse but I get levels without the values I want. Any thoughts would be much appreciated. Here's some fake data:

 o5<-c(1,0,2,0,0,NA)
 o6<-c(NA,0,NA,2,0,NA)
 o7<-c(0,0,NA,2,2,1)
 ID<-seq(1,6,1)
 d1<-cbind(ID,o5,o6,o7)

     ID o5 o6 o7
[1,]  1  1 NA  0
[2,]  2  0  0  0
[3,]  3  2 NA NA
[4,]  4  0  2  2
[5,]  5  0  0  2
[6,]  6 NA NA  1

I'm trying to combine o5,o6,o7 into an o_all variable that would look like this:

     ID o5 o6 o7 o_all
[1,]  1  1 NA  0  5
[2,]  2  0  0  0  0
[3,]  3  2 NA NA  5
[4,]  4  0  2  2  6
[5,]  5  0  0  2  7
[6,]  6 NA NA  1  7

each o variable indicates the grade level of student. If they have a nonzero value for that grade, they should get the value of the grade level in o_all (this is grade that onset of a specific behavior was witnessed). If they indicate in two or more grades, then I select the earliest value (ID #4 is an example of this). I have quite a bit of missing data that I need to accoutn for as well. Thanks!


Solution

  • d1 <- cbind(d1, o_all = apply(d1[, -1], 1, function(x) {
      i <- which.max(!is.na(x) & x > 0) 
      if(x[i] == 0) 0 else i + 4
    }))
    #     ID o5 o6 o7 o_all
    #[1,]  1  1 NA  0     5
    #[2,]  2  0  0  0     0
    #[3,]  3  2 NA NA     5
    #[4,]  4  0  2  2     6
    #[5,]  5  0  0  2     7
    #[6,]  6 NA NA  1     7