I have a tennis dataset which has player 1 and player 2, sometimes a player is player 1 and sometimes they are player 2. The dataset also shows whether the player won or lost. With p1_win and p2_win, if the player won they receive a 1 and if they lose they receive a 0.
This is what I currently have
p1_name | p2_name | p1_win | p2_win |
---|---|---|---|
Nadal | Murray | 1 | 0 |
Murray | Nadal | 0 | 1 |
Murray | Nadal | 0 | 1 |
I would like to have a rolling count of how many games the player has won before the start of the match, like below
p1_name | p2_name | p1_win | p2_win | p1_totalwins | p2_totalwins |
---|---|---|---|---|---|
Nadal | Murray | 1 | 0 | 0 | 0 |
Murray | Nadal | 1 | 0 | 0 | 1 |
Murray | Nadal | 0 | 1 | 1 | 1 |
This is the current code I have:
df_list = []
for i in range(1, 3):
tmp = df[["match_id", f"p{i}_name", f"p{i}_win"]].copy()
tmp.columns = ["match_id", "player", "score"]
df_list.append(tmp)
df_long = pd.concat(df_list)
rolling_scores = df_long.sort_values("match_id").groupby("player")["score"].cumsum().cumcount().reset_index()
rolling_scores["score"] = rolling_scores.groupby("player")["score"].shift(1).values
for i in range(1, 3):
tmp = rolling_scores.copy()
tmp.columns = [f"p{i}_name", "match_id", f"p{i}_total_wins"]
df = df.merge(tmp, on=["match_id", f"p{i}_name"], how="left")
p1_win, p2_win columns are different across your two tables btw.
There are many ways to solve your problem. The simplest I could come up with that requires the fewest operations is shown below. This method requires maintaining external state outside of your dataframe.
external_state_dic = {}
def player_win_cumsum(row):
p1 = row["p1_name"]
p2 = row["p2_name"]
p1_wins = external_state_dic.get(p1, 0)
p2_wins = external_state_dic.get(p2, 0)
external_state_dic[p1] = p1_wins + row["p1_win"]
external_state_dic[p2] = p2_wins + row["p2_win"]
return p1_wins, p2_wins
df["p1_totalwins"], df["p2_totalwins"] = zip(*df.apply(player_win_cumsum, axis=1))