Search code examples
rdataframefrequency

Get column name for first occurence of most frequent value in a row


I have a data frame that looks like the following:

week_0 <- c(5,0,1,0,0,1)
week_1 <- c(5,0,4,0,2,1)
week_2 <- c(5,0,4,0,8,1)
week_3 <- c(5,0,4,0,8,3)
week_4 <- c(1,0,4,0,8,3)
week_5 <- c(1,0,4,0,8,3)
week_6 <- c(1,0,4,0,1,3)
week_7 <- c(1,0,4,0,1,3)
week_8 <- c(1,0,6,0,3,4)
week_9 <- c(2,4,6,7,3,4)
week_10 <- c(2,4,6,7,3,4)
Participant <- c("Lion","Cat","Dog","Snake","Tiger","Mouse")
test_data <- data.frame(Participant,week_0,week_1,week_2,week_3,week_4,week_5,week_6,week_7,week_8,week_9,week_10)

> test_data

    Participant week_0 week_1 week_2 week_3 week_4 week_5 week_6 week_7 week_8 week_9 week_10
1        Lion      5      5      5      5      1      1      1      1      1      2       2
2         Cat      0      0      0      0      0      0      0      0      0      4       4
3         Dog      1      4      4      4      4      4      4      4      6      6       6
4       Snake      0      0      0      0      0      0      0      0      0      7       7
5       Tiger      0      2      8      8      8      8      1      1      3      3       3
6       Mouse      1      1      1      3      3      3      3      3      4      4       4

I would like to identify the value in a row that appears more than other value. For example, for the first row the value is 1. And the output I want to return is week_4 for the first row. For the second row the value that appears more than other is 0. And the output I want to return is week_0, etc. So the end result should be: week_4, week_0, week_1, week_0, week_2, week_3. I have to use:

apply(test_data, 1, function(x) names(which.max(table(x))))

but I do not get the result that I'm searching for. Any suggestions on how to do this?


Solution

  • Your code is a good first step. You can use the result to match() its first position in the row, then use this position to index into the column names:

    apply(test_data[, -1], 1, function(x) {
      val <- names(which.max(table(x)))
      names(test_data)[-1][[match(val, x)]]
    })
    # "week_4" "week_0" "week_1" "week_0" "week_2" "week_3"
    

    Note I use test_data[, -1] to exclude the Participant column; otherwise, the code would return the participant name if there’s no value that occurs more than once, which presumably isn’t what you want.