Search code examples
pythonpandasaggregate-functionsmediansummarization

How to find non-zero median/mean of multiple columns in pandas?


I have a long list of columns for which I want to calculate non-zero median,mean & std in a one go. I cannot just delete rows with 0 based on 1 column because the value for another column in same column may not be 0.

Below is the code I currently have which calculates median,mean etc. including zero.

    agg_list_oper={'ABC1':[max,np.std,np.mean,np.median],
    'ABC2':[max,np.std,np.mean,np.median],
    'ABC3':[max,np.std,np.mean,np.median],
    'ABC4':[max,np.std,np.mean,np.median],
.....
.....
.....
    }

    df=df_tmp.groupby(['id']).agg(agg_list_oper).reset_index()

I know I can write long code with loops to process one column at a time. Is there a way to do this in pandas groupby.agg() or some other functions elegantly?


Solution

  • You can temporarily replace 0's with NaNs. Then, pandas will ignore the NaNs while calculating medians.

    df_tmp.replace(0, np.nan).groupby(['id']).agg(agg_list_oper).reset_index()