Search code examples
rdplyrsummary

How to sum based on variable level using dplyr?


I have a dataset containing some sports performance data. Below is a small example.

Player.Name Period.Name Average.Distance Total.HIR V6.Distance   Date
Player 1    Quarter 1           2240.744    588.31       84.42   2/3/18
Player 2    Quarter 1           3008.554    833.94       10.50   2/3/18
Player 3    Quarter 1           2907.660    1020.78      58.52   2/3/18
Player 1    Quarter 2           2747.222    903.37       82.41   2/3/18
Player 2    Quarter 2           2225.836    679.79       31.32   2/3/18
Player 3    Quarter 2           3445.327    1034.16      108.20  2/3/18

I'm trying to use dplyr to sum Quarter 1 and Quarter 2 together for each of Average.Distance, Total.HIR and V6.Distance. I want to group this by Player.Name and Date, noting I have many dates in my dataset (matchdb2018). This is the code I have so far:

library(dplyr)
summary <- matchdb2018 %>%
  group_by(Player.Name, Date) %>%

I'm uncertain how to continue with the next line(s) of code and how to sum based on the level of a variable.

Any help will be greatly appreciated.


Solution

  • This shall do you the work and you probably want to keep as a data frame rather than tibble object.

    library(dplyr)
    summary <- matchdb2018 %>%
      group_by(Player.Name, Date) %>%
      summarise(tot_dist=sum(Average.Distance),tot_hir=sum(Total.HIR),tot_v6=sum(V6.Distance))%>%
      as.data.frame()