Based on the simplifed sample dataframe
import pandas as pd
import numpy as np
timestamps = pd.date_range(start='2017-01-01', end='2017-01-5', inclusive='left')
values = np.arange(0,len(timestamps))
df = pd.DataFrame({'A': values ,'B' : values*2},
index = timestamps )
print(df)
A B
2017-01-01 0 0
2017-01-02 1 2
2017-01-03 2 4
2017-01-04 3 6
I want to use a roll-forward window of size 2 with a stride of 1 to create a resulting dataframe like
timestep_1 timestep_2 target
0 A 0 1 2
B 0 2 4
1 A 1 2 3
B 2 4 6
I.e., each window step should create a data item with the two values of A and B in this window and the A and B values immediately to the right of the window as target values.
My first idea was to use pandas
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html
But that seems to only work in combination with aggregate functions such as sum, which is a different use case.
Any ideas on how to implement this rolling-window-based sampling approach?
Here is one way to do it:
window_size = 3
new_df = pd.concat(
[
df.iloc[i : i + window_size, :]
.T.reset_index()
.assign(other_index=i)
.set_index(["other_index", "index"])
.set_axis([f"timestep_{j}" for j in range(1, window_size)] + ["target"], axis=1)
for i in range(df.shape[0] - window_size + 1)
]
)
new_df.index.names = ["", ""]
print(df)
# Output
timestep_1 timestep_2 target
0 A 0 1 2
B 0 2 4
1 A 1 2 3
B 2 4 6