Search code examples
rgroup-bysubtotal

Not getting subtotals when groups in R


Every time a player changes I need subtotals of how many strikouts he had in his career.

I have tried doing it using the code below but was not getting subtotals.

player <- c('acostma01', 'acostma01', 'acostma01', 'adkinjo01', 'aguilri01', 'aguilri01', 'aguilri01', 'aguilri01', 'aguilri01')
        year <- c(2010,2011,2012,2007,1985,1986,1987,1988,1989)
        games <- c(41,44,45,1,21,28,18,11,36)
        strikeouts <- c(42,46,46,0,74,104,77,16,80)
        bb_data <- data.frame(player, year, games, strikeouts, stringsAsFactors = FALSE)

Here is code that did not work.

mets <- select(bb_data, player, year, games, strikeouts) %>% 
group_by(player, year) %>% 
colSums(SO)

Here is the output I would like to get:

player      games strikeouts
acostma01   130   134
adkinjo01   1     0
aguilri01   0     351
Grand Total       485

Here is what I was getting (tail of data):

player    team    year  games strikouts
<chr>     <chr>   <int> <int> <int>
swarzan01 NYN      2018    29    31
syndeno01 NYN      2018    25   155
vargaja01 NYN      2018    20    84
wahlbo01  NYN      2018     7     7
wheelza01 NYN      2018    29   179
zamorda01 NYN      2018    16    16

Solution

  • You could do:

    library(tidyverse)
    
    bb_data %>% 
      group_by(player) %>% 
      summarise_at(vars(games, strikeouts), sum) %>%
      add_row(player = 'Grand Total', games = NA, strikeouts = sum(.$strikeouts))
    

    This would give you:

    # A tibble: 4 x 3
      player      games strikeouts
      <chr>       <dbl>      <dbl>
    1 acostma01     130        134
    2 adkinjo01       1          0
    3 aguilri01     114        351
    4 Grand Total    NA        485
    

    Which is consistent with all values except games for aguilri01 - I presume it is a typo, but let me know if this is incorrect.

    For descending order, you could do:

    bb_data %>% 
      group_by(player) %>% 
      summarise_at(vars(games, strikeouts), sum) %>%
      arrange(-strikeouts) %>%
      add_row(player = 'Grand Total', games = NA, strikeouts = sum(.$strikeouts))
    

    Output:

    # A tibble: 4 x 3
      player      games strikeouts
      <chr>       <dbl>      <dbl>
    1 aguilri01     114        351
    2 acostma01     130        134
    3 adkinjo01       1          0
    4 Grand Total    NA        485
    

    To also include the seasons played, you can try:

    bb_data %>% 
      group_by(player) %>% 
      mutate(seasons_played = n_distinct(year)) %>%
      group_by(player, seasons_played) %>%
      summarise_at(vars(games, strikeouts), sum) %>% 
      arrange(-strikeouts) %>%
      ungroup() %>%
      add_row(player = 'Grand Total', games = NA, seasons_played = NA, strikeouts = sum(.$strikeouts))