Search code examples
rdataframesapply

Get the value with most occurrences in data frame for each row


Suppose I have a simple data frame

test_df <- data.frame(c(0,0,1,0,0,1,1,1,1,1),c(1,0,0,0,0,0,0,0,0,0))

I want to get which number (0 or 1) is the maximum for each row. In my example 1 for the first vector (6 occurrences), 0 for the second one (9 occurrences).

I started with:

> sapply(test_df,table)
  c.0..0..1..0..0..1..1..1..1..1. c.1..0..0..0..0..0..0..0..0..0.
0                               4                               9
1                               6                               1

so far looks fine. Then

> sapply((sapply(test_df,table)),max)
[1] 4 6 9 1

I got lost, did I loose the associations? 1 -> 6 , 0 -> 9 What I want is to have returned a vector with the "winner": 1,0,...

1 for the first vector (6 occurrences)
0 for the second vector (9 occurrences)
...

Solution

  • This can be done in one apply statement. Although, it's unclear whether you want the maximum occurrences for each row or column, so here's both (using @akrun 's cleaner data set), returning a vector showing the 'winner' (either 1 or 0) for each row/column.

    ## Data
    test_df <- data.frame(v1= c(0,0,1,0,0,1,1,1,1,1),
                          v2= c(1,0,0,0,0,0,0,0,0,0),
                          v3= c(1,0,0,0,0,0,0,0,0,1)) 
    #    v1 v2 v3
    # 1   0  1  1
    # 2   0  0  0
    # 3   1  0  0
    # 4   0  0  0
    # 5   0  0  0
    # 6   1  0  0
    # 7   1  0  0
    # 8   1  0  0
    # 9   1  0  0
    # 10  1  0  1
    
    ## Solution - For each row
    apply(test_df, 1, function(x) { sum(sum(x == 1) > sum(x == 0)) })
    
    ## Result
    # [1] 1 0 0 0 0 0 0 0 0 1
    
    ## Solution - For each column
    apply(test_df, 2, function(x) { sum(sum(x == 1) > sum(x == 0)) })
    
    ## Result 
    # v1 v2 v3 
    # 1  0  0