Search code examples
pythonpandasdataframegroup-by

Assign Last Value of DataFrame Group to All Entries of That Group


In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column.

I know that I am able to select the last row of the group by this command:

import pandas as pd

df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)})
print(df)
print("-")
result = df.groupby('a').nth(-1)
print(result)

Result:

   a   b
0  1  20
1  1  21
2  2  30
3  3  40
4  3  41
-
    b
a    
1  21
2  30
3  41

How would it be possible to assign the result of this operation back to the original dataframe so that I have something like:

   a   b b_new
0  1  20 21
1  1  21 21
2  2  30 30
3  3  40 41
4  3  41 41

Solution

  • Use transform with last:

    df['b_new'] = df.groupby('a')['b'].transform('last')
    

    Alternative:

    df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])
    
    print(df)
       a   b  b_new
    0  1  20     21
    1  1  21     21
    2  2  30     30
    3  3  40     41
    4  3  41     41
    

    Solution with nth and join:

    df = df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')
    print(df)
       a   b  b_new
    0  1  20     21
    1  1  21     21
    2  2  30     30
    3  3  40     41
    4  3  41     41
    

    Timings:

    N = 10000
    
    df = pd.DataFrame({'a':np.random.randint(1000,size=N),
                       'b':np.random.randint(10000,size=N)})
    
    #print (df)
    
    
    def f(df):
        return df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')
    
    #cᴏʟᴅsᴘᴇᴇᴅ1
    In [211]: %timeit df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))
    100 loops, best of 3: 3.57 ms per loop
    
    #cᴏʟᴅsᴘᴇᴇᴅ2
    In [212]: %timeit df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))
    10 loops, best of 3: 71.3 ms per loop
    
    #jezrael1
    In [213]: %timeit df['b_new'] = df.groupby('a')['b'].transform('last')
    1000 loops, best of 3: 1.82 ms per loop
    
    #jezrael2
    In [214]: %timeit df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])
    10 loops, best of 3: 178 ms per loop
        
    #jezrael3
    In [219]: %timeit f(df)
    100 loops, best of 3: 3.63 ms per loop
    

    Caveat

    The results do not address performance given the number of groups, which will affect timings a lot for some of these solutions.