Search code examples
pythonpandasdataframemergerows

Arrange dataframe based on values on different columns


Input: df as follows:

appl_id  CNET   CCON    SCORE   DER   TOL_TNW
863793   42.6   4       752     0.4   1.8
863487   0      1       761.5   0.6   2.6
863487   0      1       770     0.6   2.6
863283   0      NaN     691     1.9   7.3
863283   0      5       691     NaN   7.3
900555   NaN    NaN     650     0     NaN

Output Seek:

With respect to appl_id, different values in columns need to be concatenated to list and retain if values are similar.

appl_id  CNET   CCON    SCORE       DER   TOL_TNW
863793   42.6   4       752         0.4   1.8
863487   0      1       [761.5,770] 0.6   2.6
863283   0      5       691         1.9   7.3
900555   NaN    NaN     650         0     NaN

I have tried with

df.set_index('appl_id').T \
  .apply(lambda x: x.shift(len(x) - x.index.get_loc(x.last_valid_index()) - 1)).T

but not solving my purpose. Can anyone have better suggestion how to do this?


Solution

  • Use custom lambda function in GroupBy.agg with convert values to sets and remove missing values, next is for add default values np.nan if out is empty set:

    def f(x):
        out = set(x.dropna())
        return list(out) if len(out) > 1 else next(iter(out), np.nan)
    df = df.groupby('appl_id').agg(f).reset_index()
    print (df)
       appl_id  CNET  CCON           SCORE  DER  TOL_TNW
    0   863283   0.0   5.0           691.0  1.9      7.3
    1   863487   0.0   1.0  [761.5, 770.0]  0.6      2.6
    2   863793  42.6   4.0           752.0  0.4      1.8
    3   900555   NaN   NaN           650.0  0.0      NaN