Search code examples
rdataframeunique

R construct summary of values from columns


I would like to make an array that summarises the rows of a data frame with the unique values contained within said rows.

with sample the following example code:

ref <- c(1:8)

data1 <- c("A","","C","","","","A","")
data2 <- c("A","","","A","C","","","")
data3 <- c("","B","","","","","","B")
data4 <- c("A","B","","","","D","A","")

initial.data <- data.frame(ref, data1, data2, data3, data4)

I can obtain what I want with:

summary.data <- paste(initial.data[,2], initial.data[,3], 
                  initial.data[,4], initial.data[,5], sep='') 

desired.data <- substring(summary.data,1,1)

However, I would like a more parsimonious way of coding this and one that does not assume that each row may only take one value.


Solution

  • You can try

     apply(initial.data[-1],1, function(x) unique(x[x!='']))
     #[1] "A" "B" "C" "A" "C" "D" "A" "B"
    

    Or

     substr(do.call(paste0, initial.data[-1]),1,1)
     #[1] "A" "B" "C" "A" "C" "D" "A" "B"
    

    Or use max.col

     initial.data[cbind(1:nrow(initial.data),max.col(initial.data[-1]!='')+1)]
     #[1] "A" "B" "C" "A" "C" "D" "A" "B"