Search code examples
pythonpandaspandas-groupbyquantilepandas-apply

Groupby Apply Quantile Replacement


I'm trying to use python's pandas groupby, apply, where and quantile to replace values that fall below a 50% quantile with NaN by 'date' group however it seems to be returning lists in the cells. How can I get these results in a new column after the column 'value'.

This is the code I have (any other approaches are welcome). It returns lists in cells:

In[0]: df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))  
Out[0]:                            
date                            value     
2019-12-23  [nan, nan, 3.0, 4.0, 5.0]
2014-08-13  [nan, nan, 3.0, 4.0, 5.0]

If I create a new column it returns NaN in new column:

In[1]: df['new_value']= df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[1]: 
        date  value    new_value
0 2019-12-23      1.0       NaN
1 2019-12-23      2.0       NaN
2 2019-12-23      3.0       NaN
3 2019-12-23      4.0       NaN
4 2019-12-23      5.0       NaN
5 2014-08-13      1.0       NaN
6 2014-08-13      2.0       NaN
7 2014-08-13      3.0       NaN
8 2014-08-13      4.0       NaN
9 2014-08-13      5.0       NaN

I would like to get to this:

        date     value    new_value
0 2019-12-23      1.0        NaN
1 2019-12-23      2.0        NaN
2 2019-12-23      3.0        3.0
3 2019-12-23      4.0        4.0
4 2019-12-23      5.0        5.0
5 2014-08-13      1.0        NaN
6 2014-08-13      2.0        NaN
7 2014-08-13      3.0        3.0
8 2014-08-13      4.0        4.0
9 2014-08-13      5.0        5.0

Solution

  • Instead of apply you can use transform

    df["new_value"] = df.groupby("date")["value"].transform(
        lambda x: np.where(x < x.quantile(0.5), np.nan, x)
    )
    
    
        date    value   new_value
    0   2019-12-23  1.0     NaN
    1   2019-12-23  2.0     NaN
    2   2019-12-23  3.0     3.0
    3   2019-12-23  4.0     4.0
    4   2019-12-23  5.0     5.0
    5   2014-08-13  1.0     NaN
    6   2014-08-13  2.0     NaN
    7   2014-08-13  3.0     3.0
    8   2014-08-13  4.0     4.0
    9   2014-08-13  5.0     5.0