I am new to sklearn. I have an assignment to do Linear Regression, Logistic Regression, etc. I am trying to create data to compare the results. My data looks like:
Closing_Price Daily_Returns Daily_Returns_1 Daily_Returns_2 Daily_Returns_3 Daily_Returns_4 Daily_Returns_5
Date
1980-12-22 0.53 0.058269 0.040822 0.042560 0.021979 -0.085158 -0.040005
1980-12-23 0.55 0.037041 0.058269 0.040822 0.042560 0.021979 -0.085158
1980-12-24 0.58 0.053110 0.037041 0.058269 0.040822 0.042560 0.021979
1980-12-26 0.63 0.082692 0.053110 0.037041 0.058269 0.040822 0.042560
1980-12-29 0.64 0.015748 0.082692 0.053110 0.037041 0.058269 0.040822
What I want to do is use sklearn linear regression for start to calculate and plot the results along with Daily Returns. This is what I am doing:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression as lr
from sklearn.linear_model import LogisticRegression as lor
X = apple['Closing_Price'].values.reshape(-1,1)
y = apple['Daily_Returns'].values.reshape(-1,1)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2)
LinReg = lr()
LinReg.fit(X_train,y_train)
LinRegPred = LinReg.predict(X_test)
My question: Is it possible to create a 2D array with Column 1 as the index values from the dataframe of the original data set and column 2 as the predicted Linear Regression results?
Where apple.index
:
DatetimeIndex(['1980-12-22', '1980-12-23', '1980-12-24', '1980-12-26',
'1980-12-29', '1980-12-30', '1980-12-31', '1981-01-02',
'1981-01-05', '1981-01-06',
...
'2019-05-22', '2019-05-23', '2019-05-24', '2019-05-28',
'2019-05-29', '2019-05-30', '2019-05-31', '2019-06-03',
'2019-06-04', '2019-06-05'],
dtype='datetime64[ns]', name='Date', length=9695, freq=None)
you could make the train_test_split
rather on the data frame
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression as lr
from sklearn.linear_model import LogisticRegression as lor
import numpy as np
data_train, data_test, = train_test_split(apple,test_size = 0.2)
X_train = data_train['Closing_Price'].values.reshape(-1,1)
y_train = data_train['Daily_Returns'].values.reshape(-1,1)
X_test = data_test['Closing_Price'].values.reshape(-1,1)
y_test = data_test['Daily_Returns'].values.reshape(-1,1)
LinReg = lr()
LinReg.fit(X_train,y_train)
LinRegPred = LinReg.predict(X_test)
and then you can just acess your index and creating the 2D array as follows :
from datetime import datetime
predictedWithIndexes = [list(index.astype(str)), list(LinRegPred)]
pdi = pd.DataFrame(predictedWithIndexes)
pdi = pdi.T
pdi.columns = ['Date','Predicted_Linear_Regression']
pdi['Predicted_Linear_Regression'] = pdi['Predicted_Linear_Regression'].astype(float)
pdi['Date'] = pd.to_datetime(pdi['Date'].str[0])
I wish that I have answered your question