I have a Pandas dataframe that looks as such:
Scaled
Date
2020-07-01 02:40:00 0.604511
2020-07-01 02:45:00 0.640577
2020-07-01 02:50:00 0.587683
2020-07-01 02:55:00 0.491515
....
I am trying to add a new column called X
which is supposed to look as such, where every two previous values become an array:
Scaled X
Date
2020-07-01 02:40:00 0.604511 nan
2020-07-01 02:45:00 0.640577 nan
2020-07-01 02:50:00 0.587683 [0.604511 0.640577]
2020-07-01 02:55:00 0.491515 [0.640577 0.587683]
...
I am trying a for
-loop to do so, but I don't think this is the most elegant and efficient way, so any suggestion of how to do this in pandas? (but it is not going as intended)
window_size = 2
for i in range(window_size, df.shape[0]):
df['X'][i] = df['Scaled'][i - window_size:window_size]
To use pandas, you may use list comprehension and concat
and shift
window_size = 2
df['X'] = (pd.concat([df.Scaled.shift(-i) for i in range(window_size)], axis=1)
.shift(window_size).values.tolist())
Out[213]:
Scaled X
0 0.604511 [nan, nan]
1 0.640577 [nan, nan]
2 0.587683 [0.604511, 0.6405770000000001]
3 0.491515 [0.6405770000000001, 0.587683]