Search code examples
pythonpandasdataframeaggregatetransform

Create a new column with unique values from another in python Pandas - without grouping


I already posted a question, but I presented it badly. Here my problem:

I have a dataframe like that:

Col1 Col2 Col3 Col4 DESIRED COLUMN
SF 123 01/02 UP UP, WA
BF 543 30/12 DO DO, AF
QW 241 30/12 AF DO, AF
SF 123 01/02 WA UP, WA
QW 789 20/11 D D
SF 678 31/12 OT OT

I wish to obtain the DESIRED COLUMN. The logic is the following:

  • Have a condition that distinguishes the cases when Col1 is equal to SF or not
  • In case Col1 is NOT equal to SF then I would groupby for Col3 and I would aggregate items of Col4
  • In case Col1 is equal to SF then I would groupby for Col2 and Col3 and I would aggregate items of Col4

Where is my problem? I don't know if I overthought but using groupby pandas reduces the rows of the dataframe. My objective is simply to add a column, like I showed above.

Thank you in advance!!!


Solution

  • Use GroupBy.transform with numpy.where:

    m = df.Col1.eq('SF')
    
    s1 = df.groupby(['Col2','Col3'])['Col4'].transform(', '.join)
    s2 = df.groupby(['Col3'])['Col4'].transform(', '.join)
    df['DESIRED COLUMN'] = np.where(m, s1, s2)
    print (df)
      Col1  Col2   Col3 Col4 DESIRED COLUMN
    0   SF   123  01/02   UP         UP, WA
    1   BF   543  30/12   DO         DO, AF
    2   QW   241  30/12   AF         DO, AF
    3   SF   123  01/02   WA         UP, WA
    4   QW   789  20/11    D              D
    5   SF   678  31/12   OT             OT
    

    Possible solution if NaNs in Col2, Col3:

    m = df.Col1.eq('SF')
    
    df1 = df.fillna({'Col2':'nan', 'Col3':'nan'})
    s1 = df1.groupby(['Col2','Col3'])['Col4'].transform(', '.join)
    s2 = df1.groupby(['Col3'])['Col4'].transform(', '.join)
    df['DESIRED COLUMN'] = np.where(m, s1, s2)