Search code examples
rgroup-bysummary

R: how to create median and agreement function for multiple groups


I have the following data structure:

     Player Team Round Question Answer 
 1:      2    1     1        1      1 
 2:      5    1     1        1      1 
 3:      8    1     1        1      1 
 4:      9    1     1        1      1
 5:     10    1     1        1      1
 6:      2    1     1        2      4
 7:      5    1     1        2      5
 8:      8    1     1        2      5 
 9:      9    1     1        2      5
10:     10    1     1        2      5 
11:      2    1     1        4      4 
12:      5    1     1        4      3 
13:      8    1     1        4      4 
14:      9    1     1        4      2 
15:     10    1     1        4      4 
16: ...

So there are several players from several team, answering several questions. There are always 2 rounds of games.

What I try to calculate is the medium and the agreement coefficient (see agrmt package) from the data by grouping the Team and the Question.

The result should look like this:

      Team Question Median_R1 Agrmt_R1 Median_R2 Agrmt_R2 
 1:      1       1         1        1         1        1 
 2:      1       2         2     0.83         1        1
 3:      ... 
 4:      5      10         4        1         4        1

Does someone know if this is possible? I could not find a solution for this. I can solve the median and the agreement coefficient standalone, but not combined?

Every hint is welcome. Thank you very much.

UPDATE:
The agreement function returns a coefficient between -1 and 1. The values represent.

  • 1 represents a full agreement (e.g. if every player answers 5).
  • 0 would be, if every player has a different answer.
  • -1 would be, if a disagreement exists (some players say answer 1 and others say 5)

enter image description here

Compared to the median, the agreement functions takes a vector of the frequency vector.

For example, we have the following answers

     Player Team Round Question Answer 
 6:      2    1     1        2      4
 7:      5    1     1        2      5
 8:      8    1     1        2      5 
 9:      9    1     1        2      5
10:     10    1     1        2      5 

The function inputs would look like this:
Median input: 4,5,5,5,5 --> Result: 5
Agreement input: 0,0,0,1,4 --> Result: 0.9

UPDATE 2: SOLVED

The calculation of the agreement could be done with the following code:

agreement(table(factor(x, levels=1:5)))

The final is based on @sandipan implementation. I had to add another sorting step in order to combine the right data.frames:

library(agrmt)
df1 <- unique(df[c('Party', 'Question')])
for (df.R in split(df, df$Round)) {
  round <- unique(df.R$Round)
  # get the data.frame of the current Round.
  df2 <- as.data.frame(as.list(aggregate(Answer ~ Party + Question + Round, 
           df.R, FUN = function(x) c(Median = median(x), Agrmt =  agreement(table(factor(x, levels=1:5)))))))
  # sort it and take only the columns of median and agreement
  df3 <- df2[with(df2, order(Party, Question)),][4:5]
  names(df3) <- c(paste('Median_R', round, sep=''), paste('Agrmt_R', round, sep=''))
      df1 <- cbind.data.frame(df1, df3)
}

df1

Thank you all for the help.


Solution

  • Here are three approaches: base R aggregate, dplyr, and data.table.

    With base R aggregate:

    library(agrmt)
    
    aggregate(Answer ~ Team + Round + Question, data=dat,
              FUN = function(x) {
                c(Median=median(x), 
                  Agreement=agreement(table(factor(x, levels=1:5))))
              })
    
      Team Round Question Answer.Median Answer.Agreement
    1    1     1        1           1.0              1.0
    2    1     1        2           5.0              0.9
    3    1     1        4           4.0              0.7
    

    With dplyr:

    library(dplyr)
    
    dat.summary = dat %>% group_by(Team, Round, Question) %>%
      summarise(Median=median(Answer),
                Agreement=agreement(table(factor(Answer, levels=1:5))))
    
       Team Round Question Median Agreement
    1     1     1        1      1       1.0
    2     1     1        2      5       0.9
    3     1     1        4      4       0.7
    

    With data.table:

    library(data.table)
    
    dat.summary = setDT(dat)[, list(Median=median(Answer), 
                                    Agreement=agreement(table(factor(Answer, levels=1:5)))), 
                             by=list(Team, Round, Question)]
    
       Team Round Question Median Agreement
    1:    1     1        1      1       1.0
    2:    1     1        2      5       0.9
    3:    1     1        4      4       0.7
    

    To get a "wide" data frame as the final output:

    In the examples above, I've left the output in "long" format. If you want to reshape to "wide" format, so that each Round get its own set of columns, you can do as follows:

    First, let's add a second Round to the sample data by stacking another copy of the sample data:

    library(dplyr)
    library(reshape2)
    library(agrmt)
    
    dat = bind_rows(dat, dat %>% mutate(Round=2))
    

    Now calculate the median and agreement with the same code we used before in the dplyr example:

    dat.summary = dat %>% 
      group_by(Team, Round, Question) %>%
      summarise(Median=median(Answer),
                Agreement=agreement(table(factor(Answer, levels=1:5))))
    

    Finally, reshape to wide format. This requires first "melting" the data to stack the Median and Agreement columns into a single column, and then casting to wide format. We also include the second line of code to add "Round" to each Round so that we get the column names we want in the wide data frame:

    dat.summary = dat.summary %>%
      mutate(Round = paste0("Round", Round)) %>% 
      melt(id.var=c("Team","Question","Round")) %>%
      dcast(Team + Question ~ variable + Round, value.var="value")
    
      Team Question Median_Round1 Median_Round2 Agreement_Round1 Agreement_Round2
    1    1        1             1             1              1.0              1.0
    2    1        2             5             5              0.9              0.9
    3    1        4             4             4              0.7              0.7