I have a dataset with this shape:
IPS10 IPS11 IPS299 IPS12 IPS13 IPS18 IPS25 IPS32 IPS34 IPS38 ... UTL11 UTL15 UTL17 UTL21 UTL22 UTL29 UTL31 UTL32 UTL33 GDP
0 3.040102 2.949695 3.319379 3.251798 4.525330 0.379066 2.731048
2.643842 2.453547 1.201144 ... 2.978505 -0.944465 3.585314 6.169364
-0.395442 0.433999 -0.350617 0.899361 1.312837 -1.328266
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
588 -3.126587 -3.576200 -3.512180 -2.411509 -4.629191 -0.391066
-3.902952 -2.169446 -3.584623 0.082130 ... -2.741805 -2.838139
-3.435455 -3.343945 -0.710171 1.862004 -1.025504 -0.128602
-0.204241 -0.345851
with its shape is like
(593, 144)
now i'd like to:
could you please help me? Thanks
import pandas as pd
from sklearn.linear_model import LinearRegression
# simulate data
n, m = 593, 144
df = pd.DataFrame(np.random.random((n, m)))
df.rename(columns={m - 1: 'GDP'}, inplace=True)
# split data into train / test and X / y
# assuming data ordered chronologically
test_size = 50
train, test = df[:-test_size], df[-test_size:]
X_train, y_train = train.drop(columns='GDP'), train['GDP']
X_test, y_test = test.drop(columns='GDP'), test['GDP']
# linear regression
window_size = 30
reestimation_frequency = 1
for idx in range(0, train.shape[0] - window_size, reestimation_frequency):
X_window = X_train[idx:idx + window_size]
y_window = y_train[idx:idx + window_size]
reg = LinearRegression()
reg.fit(X_window, y_window)
# do sth with reg ...