Search code examples
pandaspandas-groupbytransformmethod-chaining

How to use groupby transform across columns using method chaining?


Using method chaining, I'm looking to create a new column using the value of col_2 when col_1==0.

np.random.seed(1)

df = pd.DataFrame({'group':list('AAABBBCCDDDD'),
              'col_1': [-1,0,1,-1,0,1,0,1,-1,0,1,2],
              'col_2': np.random.randint(0, 10, 12)})

    group   col_1   col_2
0     A       -1      5
1     A        0      8
2     A        1      9
3     B       -1      5
4     B        0      0
5     B        1      0
6     C        0      1
7     C        1      7
8     D       -1      6
9     D        0      9
10    D        1      2
11    D        2      4

Desired output:

    group   col_1   col_2   new_col
0     A       -1       5      8
1     A        0       8      8
2     A        1       9      8
3     B       -1       5      0
4     B        0       0      0
5     B        1       0      0
6     C        0       1      1
7     C        1       7      1
8     D       -1       6      9
9     D        0       9      9
10    D        1       2      9
11    D        2       4      9

My approach using groupby transform (I would love for this to work but apparently transform only accesses single columns ):

df.assign(
    new_col = lambda df_: df_.groupby('group').transform(lambda x: x.loc[x.col_1==0].col_2)
)
AttributeError: 'Series' object has no attribute 'col_1'

Came up with this solution while writing this question but thought I'd post anyways:

df.assign(
    new_col = lambda df_: df_.merge(df.groupby('group')
     .apply(lambda x: x.loc[x.col_1==0].col_2)
     .reset_index().rename(columns={'col_2':'new_col'}), on='group'
    ).new_col
)

Is there a better way?


Solution

  • Use Series.where for replace all col_2 values if not match col_1 ==1 and then use GroupBy.first for first not NaN value:

    df = df.assign(
        new_col = lambda df_: df_['col_2'].where(df_['col_1'] == 0)
                                          .groupby(df_['group']).transform('first')
    )
    print (df)
       group  col_1  col_2  new_col
    0      A     -1      5      8.0
    1      A      0      8      8.0
    2      A      1      9      8.0
    3      B     -1      5      0.0
    4      B      0      0      0.0
    5      B      1      0      0.0
    6      C      0      1      1.0
    7      C      1      7      1.0
    8      D     -1      6      9.0
    9      D      0      9      9.0
    10     D      1      2      9.0
    11     D      2      4      9.0
    

    Another solution with Series.map filtered rows by condition with DataFrame.set_index for index by group column:

    df = df.assign(
        new_col = lambda df_: df_['group'].map(df_.loc[df_['col_1'] == 0]
                                                  .set_index('group')['col_2'])
    )
    print (df)
       group  col_1  col_2  new_col
    0      A     -1      5        8
    1      A      0      8        8
    2      A      1      9        8
    3      B     -1      5        0
    4      B      0      0        0
    5      B      1      0        0
    6      C      0      1        1
    7      C      1      7        1
    8      D     -1      6        9
    9      D      0      9        9
    10     D      1      2        9
    11     D      2      4        9