I'm trying to use python's pandas groupby, apply, where and quantile to replace values that fall below a 50% quantile with NaN by 'date' group however it seems to be returning lists in the cells. How can I get these results in a new column after the column 'value'.
This is the code I have (any other approaches are welcome). It returns lists in cells:
In[0]: df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[0]:
date value
2019-12-23 [nan, nan, 3.0, 4.0, 5.0]
2014-08-13 [nan, nan, 3.0, 4.0, 5.0]
If I create a new column it returns NaN in new column:
In[1]: df['new_value']= df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[1]:
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 NaN
3 2019-12-23 4.0 NaN
4 2019-12-23 5.0 NaN
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 NaN
8 2014-08-13 4.0 NaN
9 2014-08-13 5.0 NaN
I would like to get to this:
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 3.0
3 2019-12-23 4.0 4.0
4 2019-12-23 5.0 5.0
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 3.0
8 2014-08-13 4.0 4.0
9 2014-08-13 5.0 5.0
Instead of apply
you can use transform
df["new_value"] = df.groupby("date")["value"].transform(
lambda x: np.where(x < x.quantile(0.5), np.nan, x)
)
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 3.0
3 2019-12-23 4.0 4.0
4 2019-12-23 5.0 5.0
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 3.0
8 2014-08-13 4.0 4.0
9 2014-08-13 5.0 5.0