Search code examples
rrowsmedian

Computing the median over both rows and columns in a dataframe in R


I have a dataframe that looks like:

Group1 | Group2 | Score1 | Score2 
-------|--------|--------|--------
   A   |    1   |   10   |   11   
   A   |    2   |   13   |   14  
   B   |    1   |    4   |    7   
   B   |    2   |    9   |   12  
   A   |    1   |    5   |   11   
   B   |    2   |    9   |   13  
   A   |    2   |    7   |    9  
   B   |    1   |    7   |   10  

What I need is to create a new variable, which gives me the median value for each subgroup combination, something like

Group1 | Group2 | Score1 | Score2 | Median
-------|--------|--------|--------|-------
   A   |    1   |   10   |   11   |  10.5
   A   |    2   |   13   |   14   |  11.5
   B   |    1   |    4   |    7   |   7.0
   B   |    2   |    9   |   12   |  10.5
   A   |    1   |    5   |   11   |  10.5 
   B   |    2   |    9   |   13   |  10.5
   A   |    2   |    7   |    9   |  11.5
   B   |    1   |    7   |   10   |   7.0

(My median computation by hand could be wrong, but I think the point has been made).

I tried using the aggregate function like:

MedianAggregate <- aggregate(df[,45:47],
                             by = list(df$provider, df$field),
                             FUN = median) 

What I get, instead of what I want, is something that looks like:

Group1 | Group2 | Score1 | Score2 
-------|--------|--------|--------
   A   |    1   |   7.5  |   11  
   A   |    2   |   10   |   11.5   
   B   |    1   |   5.5  |   8.5  
   B   |    2   |    9   |   12.5   

basically what I need is to get the same output I would get if I would apply the median function to three rows in a matrix, rather than a dataframe, for each combination of subgroup.

Any help would be greatly appreciated.


Solution

  • Here's a possible solution, if I understood correctly:

    library(reshape2)
    library(dplyr)
    df2 <- melt(df, id.vars = c("Group1","Group2"))
    df2 %>% group_by(Group1, Group2) %>% summarise(median = median(value)) %>% merge(df, by = c("Group1","Group2"))