Search code examples
pythonpandasrolling-computation

Pandas rolling count over multiple columns - Python


I have a tennis dataset which has player 1 and player 2, sometimes a player is player 1 and sometimes they are player 2. The dataset also shows whether the player won or lost. With p1_win and p2_win, if the player won they receive a 1 and if they lose they receive a 0.

This is what I currently have

p1_name p2_name p1_win p2_win
Nadal Murray 1 0
Murray Nadal 0 1
Murray Nadal 0 1

I would like to have a rolling count of how many games the player has won before the start of the match, like below

p1_name p2_name p1_win p2_win p1_totalwins p2_totalwins
Nadal Murray 1 0 0 0
Murray Nadal 1 0 0 1
Murray Nadal 0 1 1 1

This is the current code I have:

df_list = []
for i in range(1, 3):
    tmp = df[["match_id", f"p{i}_name", f"p{i}_win"]].copy()
    tmp.columns = ["match_id", "player", "score"]
    df_list.append(tmp)
df_long = pd.concat(df_list)
rolling_scores = df_long.sort_values("match_id").groupby("player")["score"].cumsum().cumcount().reset_index()
rolling_scores["score"] = rolling_scores.groupby("player")["score"].shift(1).values
for i in range(1, 3):
    tmp = rolling_scores.copy()
    tmp.columns = [f"p{i}_name", "match_id", f"p{i}_total_wins"]
    df = df.merge(tmp, on=["match_id", f"p{i}_name"], how="left")

Solution

  • p1_win, p2_win columns are different across your two tables btw.

    There are many ways to solve your problem. The simplest I could come up with that requires the fewest operations is shown below. This method requires maintaining external state outside of your dataframe.

    external_state_dic = {}
    
    
    def player_win_cumsum(row):
        p1 = row["p1_name"]
        p2 = row["p2_name"]
        p1_wins = external_state_dic.get(p1, 0)
        p2_wins = external_state_dic.get(p2, 0)
        external_state_dic[p1] = p1_wins + row["p1_win"]
        external_state_dic[p2] = p2_wins + row["p2_win"]
    
        return p1_wins, p2_wins
    
    
    df["p1_totalwins"], df["p2_totalwins"] = zip(*df.apply(player_win_cumsum, axis=1))