Search code examples
rcountfrequencycumulative-sumcumulative-frequency

Dynamic counting of occurrences


R newb. Small rep of my data.

TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(cbind(TeamHome, TeamAway))
df

   TeamHome TeamAway
     LAL      IND
     HOU      SAS
     SAS      LAL
     LAL      HOU

Imagine these are the first four games of a season with thousands of games. For the home team and the visitor team I want to compute the cumulative number of games played at home, on the road and total. So 3 new columns for both the home team and the visiting team. I would like to get something like this (in this case I am only calculating the new variables for the HOME TEAM):

    TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1      LAL      IND                         1                       0                  1
2      HOU      SAS                         1                       0                  1
3      SAS      LAL                         1                       1                  2
4      LAL      HOU                         2                       1                  3

To compute the first column (HomeTeamGamesPlayedatHome) I managed to do it with:

df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum)

But it feels over complicated and also I can't calculate the other columns with this approach.

I also thought of using the formula table to count the number of occurrences:

 table(df$TeamHome)

but it just computes the totals and I want the result at any given point in time. thanks!


Solution

  • library(dplyr)
    df <- df %>% group_by(TeamHome) %>% 
      mutate(HomeGames = seq_along(TeamHome))
    lst <- list()
    for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i])
    df$HomeTeamGamesPlayedRoad <- unlist(lst)
    df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad)
      TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames
    1      LAL      IND         1                       0         1
    2      HOU      SAS         1                       0         1
    3      SAS      LAL         1                       1         2
    4      LAL      HOU         2                       1         3
    

    HomeGames is created with seq_along iterated by row. HomeTeamGamesPlayedRoad is created with a loop checking the values in TeamAway up to and including the current game. The final row is the sum of the other two created.