I have a dataset that looks like this. For respondents who only picked one favorite fruit, I need to have variable returned that is a factor which is that respondent's favorite fruit. If a respondent picked more than one favorite fruit, their value should be NA
.
fav_apple<-sample(c(NA,1), size=100, replace=T, prob=c(0.9,0.1))
fav_orange<-sample(c(NA,1), size=100, replace=T, prob=c(0.9,0.1))
fav_banana<-sample(c(NA,1), size=100, replace=T, prob=c(0.9,0.1))
df<-data.frame(fav_apple, fav_orange, fav_banana)
df$id<-seq(1,nrow(df))
I tried tackling this with pivot_longer
and then repivotting, but I end up with duplicate rows.
Using ifelse
on the rowSums
. Since favorites is a numeric matrix apply
can be used efficiently on as.matrix(df[,1:3])
colnms <- colnames(df[,1:3])
cbind(df,
favorite = as.factor(unlist(
ifelse(rowSums(df[,1:3], na.rm=T) == 1,
apply(as.matrix(df[,1:3]), 1, \(x) colnms[which(!is.na(x))]),
NA))))
fav_apple fav_orange fav_banana id favorite
1 NA NA NA 1 <NA>
2 NA NA NA 2 <NA>
3 NA 1 NA 3 fav_orange
4 NA NA NA 4 <NA>
5 1 1 1 5 <NA>
6 NA NA NA 6 <NA>
7 NA NA NA 7 <NA>
8 NA NA NA 8 <NA>
9 NA NA NA 9 <NA>
10 NA NA NA 10 <NA>
11 NA NA NA 11 <NA>
12 NA NA NA 12 <NA>
13 NA NA NA 13 <NA>
14 NA NA NA 14 <NA>
15 NA NA 1 15 fav_banana
...