Let's say I have a dataframe df
like this:
Home Away HomeGoals AwayGoals HomePoints AwayPoints
1 Arsenal Chelsea 3 1 3 0
2 Chelsea Liverpool 0 0 1 1
3 Liverpool Arsenal 2 1 3 0
4 Chelsea Arsenal 4 5 0 3
5 Liverpool Chelsea 3 2 3 0
6 Arsenal Liverpool 2 2 1 1
I would like to produce a new dataframe FinalStandings
showing team standings, sorted by Total Points, Goal Difference and Goals Scored, in descending order. So I would need to introduce new columns Team
, TotalPoints
, GoalDiff
, and GoalsScored.
So for Arsenal, the corresponding row of FinalStandings
should look like this:
Team TotalPoints GoalDiff GoalsScored
Arsenal 7 2 11
I suppose I need to use the library(dplyr)
(or maybe some other way) along with some sort of aggregating group_by()
type functions, but I am struggling to implement it and get it to work. Please show how can one achieve the desired result efficiently?
Edit: note that each team appears in both "Home" and "Away" columns in df
, so one cannot just simply group by Home
or Away
column.
One way I can think of is to do group_by
on Home
column and perform some summary operation, then do the same onAway
, then usecbind
to combine together. But seems I need to do the whole operation several times. Is there a faster way?
Another solution which is kind of similar in the single steps to the one from @Jon Spring but uses a different selection strategy to achieve the same result.
Step 1 - create intermediate helper columns GoalDiff
. The columns in df
are alternating for Home
and Away
so we keep this format for the new columns.
df_temp <- df %>%
mutate(HomeGoalDiff = HomeGoals - AwayGoals,
AwayGoalDiff = - HomeGoalDiff)
Step 2 - omit the Home
and Away
prefixes as we know they are alternating and we select the columns by index.
colnames(df_temp) <- rep(c("Team","Goal","Points","Diff"), each=2)
Step 3 - Select the Home
and Away
columns to create another dataframe that can be used to aggregate the metrics per Team.
FinalStandings <- bind_rows(
df_temp %>% select(seq(1,7,2)), # home cols 1,3,5,7
df_temp %>% select(seq(2,8,2)) # away cols 2,4,6,8
) %>%
group_by(Team) %>%
summarise(TotalPoints = sum(Points),
GoalDiff = sum(Diff),
GoalsScored = sum(Goal)) %>%
arrange(desc(TotalPoints), desc(GoalDiff), desc(GoalsScored))
Step 4 - print the result
FinalStandings
# Team TotalPoints GoalDiff GoalsScored
# <chr> <dbl> <dbl> <dbl>
# 1 Liverpool 8 2 7
# 2 Arsenal 7 2 11
# 3 Chelsea 1 -4 7