I have a dataframe which have 3 columns with one of the columns having a list of values.
I want aggregate that list into single list when doing groupby.
DataFrame looks like this
C1 | C2 | C3 |
---|---|---|
1 | ['A','B'] | Hi |
2 | NaN | Po |
1 | ['B','C'] | Yo |
2 | ['D','E'] | Yup |
Now I want my dataframe to look like this
C1 | C2 | C3 |
---|---|---|
1 | ['A','B','C'] | ['Hi','Yo'] |
2 | ['D','E'] | ['Po','Yup'] |
I used the agrregating function with list past as parameter but I am getting the result like this
C1 | C2 | C3 |
---|---|---|
1 | [['A','B'],['B','C']] | ['Hi','Yo'] |
2 | ['D','E'] | ['Po','Yup'] |
Can anyone please help on how to get this result??
Use groupby and and agg to apply different functions to each column:
import pandas as pd
import numpy as np
df = pd.DataFrame({'C1': [1, 2, 1, 2],
'C2': [['A', 'B'], None, ['B', 'C'], ['D', 'E']],
'C3': ['Hi', 'Po', 'Yo', 'Yup']})
df = df.groupby('C1').agg({'C2': lambda x: np.unique([value for value in x.dropna()]),
'C3': list})
df
Output:
C1 | C2 | C3 |
---|---|---|
1 | [A, B, C] | [Hi, Yo] |
2 | [D, E] | [Po, Yup] |