Search code examples
pythonpandasdataframeshift

Python: create a lag (t-1) data structure of multiple elements


I'm having trouble creating a time lag column for my data. It works fine when I do it for a dataframe with a just a kind of elements, but it doesn't not work fine, when I have different elements. For example, my dataset looks something like this:

enter image description here

when using the command suggested:

data1['lag_t'] = data1['total_tax'].shift(1)

I get a result like this:

enter image description here

As you can see, it just displace all the 'total_tax' value one row. However, I need to do this lag for EACH ONE of the id_inf (as separate items).

My dataset is really huge, so I need to find a way to solve this issue. So I can get as a result a table like this:

enter image description here


Solution

  • You can groupby on index and shift

    # an example with random data.
    data1 = pd.DataFrame({'id': [9,9,9,54,54,54],'total_tax':[5,6,7,1,2,3]}).set_index('id')
    
    data1['lag_t'] = data1.groupby(level=0)['total_tax'].apply(lambda x: x.shift())
    
    print (data1)
    
        tax  lag_t
    id            
    9     5    NaN
    9     6    5.0
    9     7    6.0
    54    1    NaN
    54    2    1.0
    54    3    2.0