Search code examples
rdplyrcategorical-datamode

Mode computation on counted categorical variables


Here is my dataset:

X Totally.Disagree Disagree Agree Totally.agree
0                2        9   111           122
1                2       30   124            88
2                4       31   119            90
3               10       43   138            53
4               33       54    85            72
5               43       79    89            33
6               48       83    94            19
7               51       98    80            15
8               50      102    75            17
9               51       96    80            17

Where X (thus each line) is a question and the values are the number of people who choose this answer to the question. I would like to compute the mode (the most chosen answer) for each question.

This is what I tried:

df <- gather(df,Answer, count, Totally.Disagree:Totally.agree )
df %>% 
  group_by(X, Answer) %>%
  summarise(sum = count)%>%
  summarise(mode = df$Answer[which(df$count== max(df$count))])

But it doesn't work because max(df$count) refers to the whole dataset and not just one question.

I do not now if the way I tried is correct. If one of you could help me resolve this, I would be very grateful.


Solution

  • A different approach could be:

    df %>%
     mutate(mode = max.col(.[2:length(.)])+1) %>%
     rowwise() %>%
     mutate(mode = names(.)[[mode]]) %>%
     select(X, mode)
    
           X mode         
       <int> <chr>        
     1     0 Totally.agree
     2     1 Agree        
     3     2 Agree        
     4     3 Agree        
     5     4 Agree        
     6     5 Agree        
     7     6 Agree        
     8     7 Disagree     
     9     8 Disagree     
    10     9 Disagree  
    

    Here it, first, identifies the index of the column with the greatest count and then assigns the name of the column based on the column index.

    And if you want to include also the numbers, you can try:

    df %>%
     mutate(mode = max.col(.[2:length(.)])+1) %>%
     rowwise() %>%
     mutate(mode_names =  names(.)[[mode]], 
            mode_numbers = max(!!! rlang::syms(names(.)[2:length(.)]))) %>%
     select(X, mode_names, mode_numbers)
    
           X mode_names    mode_numbers
       <int> <chr>                <dbl>
     1     0 Totally.agree         122.
     2     1 Agree                 124.
     3     2 Agree                 119.
     4     3 Agree                 138.
     5     4 Agree                  85.
     6     5 Agree                  89.
     7     6 Agree                  94.
     8     7 Disagree               98.
     9     8 Disagree              102.
    10     9 Disagree               96.
    

    Or following your original logic:

    df %>%
     gather(mode_names, mode_numbers, -X) %>%
     group_by(X) %>%
     filter(mode_numbers == max(mode_numbers)) %>%
     arrange(X)
    
           X mode_names    mode_numbers
       <int> <chr>                <int>
     1     0 Totally.agree          122
     2     1 Agree                  124
     3     2 Agree                  119
     4     3 Agree                  138
     5     4 Agree                   85
     6     5 Agree                   89
     7     6 Agree                   94
     8     7 Disagree                98
     9     8 Disagree               102
    10     9 Disagree                96