Pandas rolling count over multiple columns - Python

I have a tennis dataset which has player 1 and player 2, sometimes a player is player 1 and sometimes they are player 2. The dataset also shows whether the player won or lost. With p1_win and p2_win, if the player won they receive a 1 and if they lose they receive a 0.

This is what I currently have

p1_name	p2_name	p1_win	p2_win
Nadal	Murray	1	0
Murray	Nadal	0	1
Murray	Nadal	0	1

I would like to have a rolling count of how many games the player has won before the start of the match, like below

p1_name	p2_name	p1_win	p2_win	p1_totalwins	p2_totalwins
Nadal	Murray	1	0	0	0
Murray	Nadal	1	0	0	1
Murray	Nadal	0	1	1	1

This is the current code I have:

df_list = []
for i in range(1, 3):
    tmp = df[["match_id", f"p{i}_name", f"p{i}_win"]].copy()
    tmp.columns = ["match_id", "player", "score"]
    df_list.append(tmp)
df_long = pd.concat(df_list)
rolling_scores = df_long.sort_values("match_id").groupby("player")["score"].cumsum().cumcount().reset_index()
rolling_scores["score"] = rolling_scores.groupby("player")["score"].shift(1).values
for i in range(1, 3):
    tmp = rolling_scores.copy()
    tmp.columns = [f"p{i}_name", "match_id", f"p{i}_total_wins"]
    df = df.merge(tmp, on=["match_id", f"p{i}_name"], how="left")

Solution

p1_win, p2_win columns are different across your two tables btw.

There are many ways to solve your problem. The simplest I could come up with that requires the fewest operations is shown below. This method requires maintaining external state outside of your dataframe.

external_state_dic = {}


def player_win_cumsum(row):
    p1 = row["p1_name"]
    p2 = row["p2_name"]
    p1_wins = external_state_dic.get(p1, 0)
    p2_wins = external_state_dic.get(p2, 0)
    external_state_dic[p1] = p1_wins + row["p1_win"]
    external_state_dic[p2] = p2_wins + row["p2_win"]

    return p1_wins, p2_wins


df["p1_totalwins"], df["p2_totalwins"] = zip(*df.apply(player_win_cumsum, axis=1))