Search code examples
pythonpandasdatasetdata-sciencenp

pandas: how to get if column is greater than x select the max of two columns otherwise select mean?


I have a df that looks like this and want to add an adj mean that selects the max if one of the two columns (avg or rolling_mean) is 0 otherwise it gets the avg of the two columns.

 ID Avg  rolling_mean   adj_mean (goal to have this column)

 0  5    0              5

 1  6    6.3            6.15

 2  5    8              6.5

 3  4    0              4

I was able to get the max value of the columns using this code

 df["adj_mean"]=df[["Avg", "rolling_mean"]].max(axis=1)

but not sure how to add the avg if both values are greater than zero.

Many thanks!


Solution

  • One approach can be to treat 0 as NaN and then simply calculate the mean

    df['adj_mean'] = df.replace({0: np.nan})[["Avg", "rolling_mean"]].mean(axis=1)
    
    Out[1]: 
       rolling_mean  Avg  adj_mean
    0           0.0    5      5.00
    1           6.3    6      6.15
    2           8.0    5      6.50
    3           0.0    4      4.00
    

    By default, df.mean() skips null values. Per the docs:

    skipna : bool, default True Exclude NA/null values when computing the result.