How to get the lagged values of a variable based on groups with pandas?

I would like to find the most efficient way for grouping data by consecutive values, and create a new variable giving the value of the previous group at each group (except for the first group, where the values should be the same).

I might not be really clear at this point, so here is a toy example:

import pandas as pd

var = [1,1,1,2,2,3,1,1,2,4,4,4]
toy_df = pd.DataFrame(var, columns = ['var'])

My desired output is the following:

desired_output = pd.DataFrame(
                {'var': var,
                 'lagged_var':[1,1,1,1,1,2,3,3,1,2,2,2]}
                )

    var  lagged_var
0     1           1
1     1           1
2     1           1
3     2           1
4     2           1
5     3           2
6     1           3
7     1           3
8     2           1
9     4           2
10    4           2
11    4           2

So far I have come up with the following function which does the trick:

def make_lag(var):
    groups = ( var.shift() != var ).cumsum()
    var_shifted = pd.Series([0]*len(var))
    for n_gp in groups.unique():
        if n_gp == 1: 
            var_shifted[groups == n_gp] = var[groups == n_gp]
        else:
            var_shifted[groups == n_gp] = var[groups == n_gp - 1].iloc[0]
    
    return ( var_shifted )

And

toy_df['lagged_values'] = toy_df.apply(lambda x: make_lag(x))

gives the desired output. However, I suspect it to be highly inefficient since it involves looping over all the rows. Does someone know a vectorized method to produce the same output?(I have to repeat this task for hundreds of very long time series, it would be really time-saving!)

Thank you very much !

Solution

Use Series.shift for next value, replace if matching original values and then repeat values by forward and back filling missing values:

s = toy_df['var'].shift()
toy_df['new'] = s.mask(toy_df['var'].eq(s)).ffill().bfill()
print (toy_df)
    var  new
0     1  1.0
1     1  1.0
2     1  1.0
3     2  1.0
4     2  1.0
5     3  2.0
6     1  3.0
7     1  3.0
8     2  1.0
9     4  2.0
10    4  2.0
11    4  2.0

If want convert values to integers:

s = toy_df['var'].shift()
toy_df['new'] = s.mask(toy_df['var'].eq(s)).ffill().bfill().astype(int)
print (toy_df)
 var  new
0     1    1
1     1    1
2     1    1
3     2    1
4     2    1
5     3    2
6     1    3
7     1    3
8     2    1
9     4    2
10    4    2
11    4    2