Search code examples
rdplyr

How do I select a respondent's value if conditions are met by row?


I have a dataset that looks like this. For respondents who only picked one favorite fruit, I need to have variable returned that is a factor which is that respondent's favorite fruit. If a respondent picked more than one favorite fruit, their value should be NA.

fav_apple<-sample(c(NA,1), size=100, replace=T, prob=c(0.9,0.1))
fav_orange<-sample(c(NA,1), size=100, replace=T, prob=c(0.9,0.1))
fav_banana<-sample(c(NA,1), size=100, replace=T, prob=c(0.9,0.1))
df<-data.frame(fav_apple, fav_orange, fav_banana)
df$id<-seq(1,nrow(df))

I tried tackling this with pivot_longer and then repivotting, but I end up with duplicate rows.


Solution

  • Using ifelse on the rowSums. Since favorites is a numeric matrix apply can be used efficiently on as.matrix(df[,1:3])

    colnms <- colnames(df[,1:3])
    
    cbind(df, 
      favorite = as.factor(unlist(
        ifelse(rowSums(df[,1:3], na.rm=T) == 1, 
               apply(as.matrix(df[,1:3]), 1, \(x) colnms[which(!is.na(x))]), 
               NA))))
        fav_apple fav_orange fav_banana  id   favorite
    1          NA         NA         NA   1       <NA>
    2          NA         NA         NA   2       <NA>
    3          NA          1         NA   3 fav_orange
    4          NA         NA         NA   4       <NA>
    5           1          1          1   5       <NA>
    6          NA         NA         NA   6       <NA>
    7          NA         NA         NA   7       <NA>
    8          NA         NA         NA   8       <NA>
    9          NA         NA         NA   9       <NA>
    10         NA         NA         NA  10       <NA>
    11         NA         NA         NA  11       <NA>
    12         NA         NA         NA  12       <NA>
    13         NA         NA         NA  13       <NA>
    14         NA         NA         NA  14       <NA>
    15         NA         NA          1  15 fav_banana
    ...