I would like to make an array that summarises the rows of a data frame with the unique values contained within said rows.
with sample the following example code:
ref <- c(1:8)
data1 <- c("A","","C","","","","A","")
data2 <- c("A","","","A","C","","","")
data3 <- c("","B","","","","","","B")
data4 <- c("A","B","","","","D","A","")
initial.data <- data.frame(ref, data1, data2, data3, data4)
I can obtain what I want with:
summary.data <- paste(initial.data[,2], initial.data[,3],
initial.data[,4], initial.data[,5], sep='')
desired.data <- substring(summary.data,1,1)
However, I would like a more parsimonious way of coding this and one that does not assume that each row may only take one value.
You can try
apply(initial.data[-1],1, function(x) unique(x[x!='']))
#[1] "A" "B" "C" "A" "C" "D" "A" "B"
Or
substr(do.call(paste0, initial.data[-1]),1,1)
#[1] "A" "B" "C" "A" "C" "D" "A" "B"
Or use max.col
initial.data[cbind(1:nrow(initial.data),max.col(initial.data[-1]!='')+1)]
#[1] "A" "B" "C" "A" "C" "D" "A" "B"