R newb. Small rep of my data.
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(cbind(TeamHome, TeamAway))
df
TeamHome TeamAway
LAL IND
HOU SAS
SAS LAL
LAL HOU
Imagine these are the first four games of a season with thousands of games. For the home team and the visitor team I want to compute the cumulative number of games played at home, on the road and total. So 3 new columns for both the home team and the visiting team. I would like to get something like this (in this case I am only calculating the new variables for the HOME TEAM):
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
To compute the first column (HomeTeamGamesPlayedatHome) I managed to do it with:
df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum)
But it feels over complicated and also I can't calculate the other columns with this approach.
I also thought of using the formula table to count the number of occurrences:
table(df$TeamHome)
but it just computes the totals and I want the result at any given point in time. thanks!
library(dplyr)
df <- df %>% group_by(TeamHome) %>%
mutate(HomeGames = seq_along(TeamHome))
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i])
df$HomeTeamGamesPlayedRoad <- unlist(lst)
df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad)
TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
HomeGames
is created with seq_along
iterated by row. HomeTeamGamesPlayedRoad
is created with a loop checking the values in TeamAway
up to and including the current game. The final row is the sum of the other two created.