Search code examples
pythonpandaslistgroup-by

Aggregating the columns having list as values in pandas dataframe


I have a dataframe which have 3 columns with one of the columns having a list of values.

I want aggregate that list into single list when doing groupby.

DataFrame looks like this

C1 C2 C3
1 ['A','B'] Hi
2 NaN Po
1 ['B','C'] Yo
2 ['D','E'] Yup

Now I want my dataframe to look like this

C1 C2 C3
1 ['A','B','C'] ['Hi','Yo']
2 ['D','E'] ['Po','Yup']

I used the agrregating function with list past as parameter but I am getting the result like this

C1 C2 C3
1 [['A','B'],['B','C']] ['Hi','Yo']
2 ['D','E'] ['Po','Yup']

Can anyone please help on how to get this result??


Solution

  • Use groupby and and agg to apply different functions to each column:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'C1': [1, 2, 1, 2],
                       'C2': [['A', 'B'], None, ['B', 'C'], ['D', 'E']],
                       'C3': ['Hi', 'Po', 'Yo', 'Yup']})
    
    df = df.groupby('C1').agg({'C2': lambda x: np.unique([value for value in x.dropna()]),
                                       'C3': list})
    df
    

    Output:

    C1 C2 C3
    1 [A, B, C] [Hi, Yo]
    2 [D, E] [Po, Yup]