To start off, a little about my problem.
I have a data frame of winners of the champions league cup indexed by years.
Like this, note team names are strings.
year team need this year team wins to date
1 team1 1 team1 1
2 team2 2 team2 1
3 team1 3 team1 2
4 team3 4 team3 1
I would like to create a new column with cumulative wins, so that i can then plot a line graph of each team and the number of times they have won the cup. ie I need the year on the x axis and the cumulative frequency on the y, with 4 different lines for the top 4 teams.
There is probably an easy way using the count function and a loop but I'm fairly new to python and don't have a very good grasp of these. Any help would be much appreciated!
Look at pandas documentation on groupby
, transform
, cumcount
, and cumsum
(http://pandas.pydata.org/pandas-docs/stable/groupby.html).
You can groupby team and use transform to do a cumsum on bool result of whether the team is null or not.
df
Input:
year team
0 1 team1
1 2 team2
2 3 team1
3 4 team3
Do transformation
df['wins to date'] = df.groupby('team').transform(lambda x: x.notnull().cumsum())
df
or as DSM suggest:
df['wins to date'] = df.groupby('team').cumcount()+1
output
year team wins to date
0 1 team1 1
1 2 team2 1
2 3 team1 2
3 4 team3 1