Search code examples
pythonpandasdataframerunning-count

More elegant way to cumcount based on a condition in python pandas


I have the following df consisting of two teams and their point differentials for games they have played.

df = pd.DataFrame({
    'Team':['A','A','A','A','A','B','B','B','B','B'],
    'Point_Diff':[-4,5,-1,2,2,6,-5,-4,3,-1]}
    )
df


  Team  Point_Diff
0   A   -4
1   A   5
2   A   -1
3   A   2
4   A   2
5   B   6
6   B   -5
7   B   -4
8   B   3
9   B   -1

I want to create a variable Losses which is a cumulative count only where Point_Diff is negative (since that means the team lost).

I have been able to do this, albeit kind of hacky and in multiple lines:

df['Losses'] = df.loc[df.Point_Diff<0].groupby(['Team'])[['Point_Diff']].transform('cumcount')+1

df.Losses = df.Losses.fillna(method='ffill')

df


  Team  Point_Diff  Losses
0   A   -4  1.0
1   A   5   1.0
2   A   -1  2.0
3   A   2   2.0
4   A   2   2.0
5   B   6   2.0
6   B   -5  1.0
7   B   -4  2.0
8   B   3   2.0
9   B   -1  3.0

There has to be a more elegant way to do this, likely with shift() or something, but was unable to figure anything out in terms of incorporating it with a groupby(). I would assume there has to be a one liner that can do this. Again, I have solved my initial task but now I just wish to clean up the code a bit.

EDIT: I'm also noticing my code doesn't even work. There is an error when team B starts in row 5. Losses should be 0, not 2. Not sure why that is.


Solution

  • Not sure if this is what you have in mind:

    df.assign(Losses=df.Point_Diff.lt(0).groupby('Team').cumsum())
    
      Team  Point_Diff  Losses
    0    A          -4       1
    1    A           5       1
    2    A          -1       2
    3    A           2       2
    4    A           2       2
    5    B           6       0
    6    B          -5       1
    7    B          -4       2
    8    B           3       2
    9    B          -1       3