python string pandas cumulative-frequency

Cumulative frequency for string occurence

To start off, a little about my problem.

I have a data frame of winners of the champions league cup indexed by years.

Like this, note team names are strings.

year    team         need this         year    team      wins to date

1       team1                           1       team1          1
2       team2                           2       team2          1
3       team1                           3       team1          2
4       team3                           4       team3          1

I would like to create a new column with cumulative wins, so that i can then plot a line graph of each team and the number of times they have won the cup. ie I need the year on the x axis and the cumulative frequency on the y, with 4 different lines for the top 4 teams.

There is probably an easy way using the count function and a loop but I'm fairly new to python and don't have a very good grasp of these. Any help would be much appreciated!

Solution

Look at pandas documentation on groupby, transform, cumcount, and cumsum (http://pandas.pydata.org/pandas-docs/stable/groupby.html). You can groupby team and use transform to do a cumsum on bool result of whether the team is null or not.

df

Input:

   year   team
0     1  team1
1     2  team2
2     3  team1
3     4  team3

Do transformation

df['wins to date'] = df.groupby('team').transform(lambda x: x.notnull().cumsum())
df

or as DSM suggest:

df['wins to date'] = df.groupby('team').cumcount()+1

output

   year   team  wins to date
0     1  team1             1
1     2  team2             1
2     3  team1             2
3     4  team3             1