Search code examples
machine-learningscikit-learnjupyter-notebooktraining-data

Python SKLearn training test data


This is my first time working on machine learning. I have an assignment to run Logistic and Bayesian Regression from Sklearn on apple stock returns and compare that with linear regression + tensor flow. I am not sure if I am correct in understanding that before I run Logistic Regression I have to train my data set. I was trying to do that my data looks like:

Closing_Price   Daily_Returns   Daily_Returns_1 Daily_Returns_2 Daily_Returns_3 Daily_Returns_4 Daily_Returns_5
Date                            
1980-12-22  0.53    0.058269    0.040822    0.042560    0.021979    -0.085158   -0.040005
1980-12-23  0.55    0.037041    0.058269    0.040822    0.042560    0.021979    -0.085158
1980-12-24  0.58    0.053110    0.037041    0.058269    0.040822    0.042560    0.021979
1980-12-26  0.63    0.082692    0.053110    0.037041    0.058269    0.040822    0.042560
1980-12-29  0.64    0.015748    0.082692    0.053110    0.037041    0.058269    0.040822

When I run

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2)

I get an error that NameError: name 'X' is not defined

Your assistance is greatly apprecaited. Thank you in advance for your time.


Solution

  • Watched a lot of youtube videos for some reason they miss telling you this. Have to define X and y like:

    X = apple['Closing_Price'].values.reshape(-1,1)

    y = apple['Daily_Returns'].values.reshape(-1,1)