Search code examples
rdataframecharacteruniqueelement

find unique strings in data frame variables


I have a data frame with several character variables, I want to find the unique string in each row. There is only a certain string duplicated in several columns per row, surrounded by NAs. I.E. the data frame "df":

  Col1 Col2 Col3
1 ABC  ABC  NA
2  NA  DEF  DEF
3 GHI  NA   NA
4 JKL  JKL  JKL

As an output I would like to have

ABC
DEF
GHI
JKL

Best would be to have some kind of apply function for each row. I tried out several variations of

apply(df,1, function(x) unique(x))

But that was not successful. I think there is quite an easy way, if you know the correct function? How can I do that?


Solution

  • We can use is.na to remove the NA elements

    unname(apply(df, 1, FUN = function(x) unique(x[!is.na(x)])))
    #[1] "ABC" "DEF" "GHI" "JKL"
    

    If there are more than one unique element per row, it will return as a list (depending upon whether the number of elements are different for each row). In that case, we can paste them together to create a single string

    unname(apply(df, 1, FUN = function(x) toString(unique(x[!is.na(x)])))) 
    

    Another option is pmax if there is only a single unique element per row

     do.call(pmax, c(df, list(na.rm=TRUE)))
     #[1] "ABC" "DEF" "GHI" "JKL"