I have a dataframe df
in R of sports teams, including a column Season
, Date
, and Goals
for every match, like so:
Season Date Home Away Goals
1 2013 06/04/2013 Arsenal Chelsea 3
2 2013 06/04/2013 Tottenham Newcastle 5
3 2013 10/04/2013 Milan Rangers 4
4 2013 07/05/2013 Real Bayern 1
5 2013 07/05/2013 Juventus Inter 2
6 2013 07/06/2013 Barcelona Chelsea 1
7 2014 03/04/2014 Braga Benfica 3
8 2014 13/04/2014 Rangers Chelsea 1
9 2014 08/05/2014 Bayern Tottenham 0
10 2014 10/05/2014 Newcastle Braga 2
I would like to group this data into months for each season and calculate :
My aim is to get a new dataframe like this:
Season month TotalGoalsMonth AvGoalsMonth AvGoalsSeason
1 2013 4 12 4.0 2.67
2 2013 5 3 1.5 2.67
3 2013 6 1 1.0 2.67
4 2014 4 4 2.0 1.50
5 2014 5 2 2.0 1.50
So for season 2013, there were months 4, 5, 6. In months 4 there were 3 matches played yielding a total of 12 goals (TotalGoalsMonth
column). The average goals per month is 12/3=4.0 (AvGoalsMonth
column). And the same for months 5, 6, then repeat for season 2014, etc.
A total of 6 matches were played in 2013, with a total of 16 goals, so average goals per 2013 season is 16 / 6 = 2.67 (AvGoalsSeason
column). Then repeat for season 2014, and so on.
Please explain how to do this in R.
Turn the dates into Date class, extract month from it. Calculate Total goals, Average goals and number of matches for each month.
library(dplyr)
library(lubridate)
df %>%
mutate(Date = dmy(Date),
Month = month(Date)) %>%
summarise(TotalGoalsMonth = sum(Goals),
AvGoalsMonth = mean(Goals),
NumOfMatches = n(),
.by = c(Season, Month)) %>%
mutate(AvGoalsSeason = sum(TotalGoalsMonth)/sum(NumOfMatches), .by = Season)
# Season Month TotalGoalsMonth AvGoalsMonth NumOfMatches AvGoalsSeason
#1 2013 4 12 4.0 3 2.666667
#2 2013 5 3 1.5 2 2.666667
#3 2013 6 1 1.0 1 2.666667
#4 2014 4 4 2.0 2 1.500000
#5 2014 5 2 1.0 2 1.500000
If your data is huge, you may use collapse
whose syntax is similar to that of dplyr
but it is much faster. (Even faster than base R in many cases.) Here's a dplyr
-free version.
library(collapse)
df |>
fmutate(Date = as.Date(Date, '%d/%m/%Y'),
Month = format(Date, '%m')) %>%
fgroup_by(Season, Month) |>
fsummarise(TotalGoalsMonth = sum(Goals),
AvGoalsMonth = mean(Goals),
NumOfMatches = GRPN()) |>
fgroup_by(Season) |>
fmutate(AvGoalsSeason = sum(TotalGoalsMonth)/sum(NumOfMatches))