Search code examples
rdataframegroup-bydata-manipulationdata-wrangling

Apply different functions to a groupby and output the results as rows instead of columns in R


Let's say I have the R dataframe

major = c("math","math", "math", "econ", "econ", "econ", "chem","chem", "chem")
name = c("bob1", "bob2", "bob3", "carl1", "carl2", "carl3", "carl4","dave", "emily")
test1 = c(96,87,67,93,99,100,72,65,59)
test2 = c(46,78,90,95,76,85,67,99,91)

df = data.frame(name, major, test1, test2)

df

Which looks like:

   name major test1 test2
1  bob1  math    96    46
2  bob2  math    87    78
3  bob3  math    67    90
4 carl1  econ    93    95
5 carl2  econ    99    76
6 carl3  econ   100    85
7 carl4  chem    72    67
8  dave  chem    65    99
9 emily  chem    59    91

I want to group by major and then for each major, compute various metrics and output them all in one column instead of a separate column for each. eg

  major     statistic              test1              test2
1  math Weighted mean [calculated score] [calculated Score]
2  math        median [calculated score] [calculated Score]
3  econ Weighted mean [calculated score] [calculated Score]
4  econ        median [calculated score] [calculated Score]
5  chem Weighted mean [calculated score] [calculated Score]
6  chem        median [calculated score] [calculated Score]

I know one option would be to make a dataframe for just statistc = 'Weighted mean' and one for statistc = 'median' and then to take the union of the two, but I want to know if there is a more direct way that will generalize to more than two statistics.


Solution

  • You can create a named vector and reference the names in the other column

    df |> 
      group_by(major) |> 
      summarise(score = c(median = median(score), mean = mean(score)), statistic= names(score)) |> 
      ungroup()
    
      major score statistic
      <chr> <dbl> <chr>    
    1 chem   4    median   
    2 chem   3.5  mean     
    3 econ   2.7  median   
    4 econ   2.87 mean     
    5 math   3.4  median   
    6 math   3.37 mean 
    

    For multiple test scores:

    df |> 
      group_by(major) |> 
      summarise(across(starts_with("test"), \(x) c(median = median(x), mean = mean(x))),
                statistic = names(test1)) |> 
        ungroup()
    
      major test1 test2 statistic
      <chr> <dbl> <dbl> <chr>    
    1 chem   65    91   median   
    2 chem   65.3  85.7 mean     
    3 econ   99    85   median   
    4 econ   97.3  85.3 mean     
    5 math   87    78   median   
    6 math   83.3  71.3 mean