Search code examples
pandaspandas-groupby

How to group by one column if condition is true in another column summing values in third column with pandas


I can't think of how to do this: As the headline explains I want to group a dataframe by the column acquired_month only if another column contains Closed Won(in the example I made a helper column that just marks True if that condition is fulfilled although I'm not sure that step is necessary). Then if those conditions are met I want to sum the values of a third column but can't think how to do it. Here is my code so far:

us_lead_scoring.loc[us_lead_scoring['Stage'].str.contains('Closed Won'), 'closed_won_binary'] = True acquired_date = us_lead_scoring.groupby('acquired_month')['closed_won_binary'].sum()

but this just sums the true false column not the sum column if the true false column is true after the acquired_month groupby. Any direction appreciated.

Thanks


Solution

  • If need aggregate column col replace non matched values to 0 values in Series.where and then aggregate sum:

    us_lead_scoring = pd.DataFrame({'Stage':['Closed Won1','Closed Won2','Closed', 'Won'],
                                    'col':[1,3,5,6],
                                    'acquired_month':[1,1,1,2]})
    
    out = (us_lead_scoring['col'].where(us_lead_scoring['Stage']
                                 .str.contains('Closed Won'), 0)
                                 .groupby(us_lead_scoring['acquired_month'])
                                 .sum()
                                 .reset_index(name='SUM'))
         
    print (out)
       acquired_month  SUM
    0               1    4
    1               2    0