Search code examples
pythonpandasdataframemin

get max and min values based on conditions in pandas dataframe


I have a dataframe like this

count A B Total
yes 4900 0 0
yes 1000 1000 0
sum_yes 5900 1000 0
yes 4000 0 0
yes 1000 0 0
sum_yes 5000 0 0

I want result like this that is calculate max of column A and B only for rows where 'count' = 'sum_yes' if value of B =0 otherwise calculate minimum

count A B Total
yes 4900 0 0
yes 1000 1000 0
sum_yes 5900 1000 1000
yes 4000 0 0
yes 1000 0 0
sum_yes 5000 0 5000

I have tried this so far:

df['Total'] = [df[['A', 'B']].where(df['count'] == 'sum_yes').max(axis=0) if 
                   'B'==0 else df[['A', 'B']]
                   .where(df['count'] == 'sum_yes').min(axis=0)]

But I am getting ValueError The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Any idea how to solve this


Solution

  • You can use numpy.where:

    new_values = np.where((df["count"] == "sum_yes") & (df.B == 0),
                           df.loc[:, ["A", "B"]].max(1),
                           df.loc[:, ["A", "B"]].min(1),
                          )
    
    df.assign(Total = new_values)
    
    
         count     A     B  Total
    0      yes  4900     0      0
    1      yes  1000     0      0
    2  sum_yes  5900  1000   1000
    3      yes  4000  1000   1000
    4      yes  1000     0      0
    5  sum_yes  5000     0   5000