Search code examples
rdataframeunique

For each column in a data.frame find rows where column is the only one to have positive value


I need to know which elements of the row are unique for each column in data.frame and then print rownames in output.

My data example:

id  A   B   C
s1  1   2   1
s2  1   0   0
s3  0   12  3
s4  0   1   0
s5  0   1   0

I'd like to get simething like this:

$A s2
$B s4,s5
$C NA 

Which means that:

A has only one unique element - s2

B has two unique elements - s4 and s5

and C has not any unique elements ,so it's filled by NA

I've tried

apply(data, 2, function(x) unique(x))

but it's not what I need..

Thanks a lot for suggestions!


Solution

  • Here is a rough base R solution:

    helper <- function(x) {
      has_p <- x > 0
      if (sum(has_p) != 1) has_p[] <- FALSE 
      has_p
    }
    step1 <- as.data.frame(t(apply(df[-1], 1, helper)))
    
    lapply(step1, function(x) df[[1]][x])
    
    $A
    [1] "s2"
    
    $B
    [1] "s4" "s5"
    
    $C
    character(0)
    

    Edit

    Here is a much simpler logic for the same solution:

    rows <- rowSums(df[-1] > 0) == 1
    lapply(df[-1], function(x) df[["id"]][rows & x > 0])
    

    Edit 2

    Put into one step (and add correct output NA when nothing unique):

    lapply(
      as.data.frame(df[-1] > 0 & rowSums(df[-1] > 0) == 1),
      function(x) {
        if (all(!x)) return(NA)
        df[["id"]][x]
      }
    )
    

    Data

    df <- structure(list(id = c("s1", "s2", "s3", "s4", "s5"), A = c(1L, 
    1L, 0L, 0L, 0L), B = c(2L, 0L, 12L, 1L, 1L), C = c(1L, 0L, 3L, 
    0L, 0L)), row.names = c(NA, -5L), class = "data.frame")