Input: df
as follows:
appl_id CNET CCON SCORE DER TOL_TNW
863793 42.6 4 752 0.4 1.8
863487 0 1 761.5 0.6 2.6
863487 0 1 770 0.6 2.6
863283 0 NaN 691 1.9 7.3
863283 0 5 691 NaN 7.3
900555 NaN NaN 650 0 NaN
Output Seek:
With respect to appl_id, different values in columns need to be concatenated to list and retain if values are similar.
appl_id CNET CCON SCORE DER TOL_TNW
863793 42.6 4 752 0.4 1.8
863487 0 1 [761.5,770] 0.6 2.6
863283 0 5 691 1.9 7.3
900555 NaN NaN 650 0 NaN
I have tried with
df.set_index('appl_id').T \
.apply(lambda x: x.shift(len(x) - x.index.get_loc(x.last_valid_index()) - 1)).T
but not solving my purpose. Can anyone have better suggestion how to do this?
Use custom lambda function in GroupBy.agg
with convert values to sets and remove missing values, next
is for add default values np.nan
if out
is empty set:
def f(x):
out = set(x.dropna())
return list(out) if len(out) > 1 else next(iter(out), np.nan)
df = df.groupby('appl_id').agg(f).reset_index()
print (df)
appl_id CNET CCON SCORE DER TOL_TNW
0 863283 0.0 5.0 691.0 1.9 7.3
1 863487 0.0 1.0 [761.5, 770.0] 0.6 2.6
2 863793 42.6 4.0 752.0 0.4 1.8
3 900555 NaN NaN 650.0 0.0 NaN