I'm having trouble creating a time lag column for my data. It works fine when I do it for a dataframe with a just a kind of elements, but it doesn't not work fine, when I have different elements. For example, my dataset looks something like this:
when using the command suggested:
data1['lag_t'] = data1['total_tax'].shift(1)
I get a result like this:
As you can see, it just displace all the 'total_tax' value one row. However, I need to do this lag for EACH ONE of the id_inf (as separate items).
My dataset is really huge, so I need to find a way to solve this issue. So I can get as a result a table like this:
You can groupby
on index and shift
# an example with random data.
data1 = pd.DataFrame({'id': [9,9,9,54,54,54],'total_tax':[5,6,7,1,2,3]}).set_index('id')
data1['lag_t'] = data1.groupby(level=0)['total_tax'].apply(lambda x: x.shift())
print (data1)
tax lag_t
id
9 5 NaN
9 6 5.0
9 7 6.0
54 1 NaN
54 2 1.0
54 3 2.0