I am trying to predict y based on two features held inside X. After reading my excel file and splitting my data into columns, my X value looks like this:
SibSp Parch
0 1 0
1 1 0
2 0 0
3 1 0
4 0 0
5 0 0
6 0 0
7 3 1
8 0 2
9 1 0
y
denotes survival rate, 1 being survived, 0 being died. X has many many more rows. I am using train_test_split(X, y, test_size=0.4, random_state=101)
to get training and testing data splits and have a method to train and test. My training code looks like this:
def train():
# Get Data Split
X_train, X_test, y_train, y_test = initData()
# Create LinearRegression Instance
lm = LinearRegression()
# Fit Training Values
lm.fit(X_train,y_train)
visualise(X_test, y_test, lm.predict(X_test))
# Return Trained Data With Testing Data
return X_test, y_test, lm
My testing code looks like this:
def test():
# Get The Trained Classifier
X, y, lm = train()
# Fit New Values
lm.fit(X, y)
visualise(X, y, lm.predict(X))
Which, appears to work fine - or so I think. I am now trying to visualise the data as a scatter plot with the prediction line plot.
def visualise(X, y, predictions):
features = X.shape[1]
colors = ['red', 'blue']
i = 0
while i <= features -1:
plt.scatter(X.iloc[:, i], y, color=colors[i])
# Update: Forgot to add this line when posting question
plt.plot(X.iloc[:, i], predictions, color=colors[i])
i=+1
But this is giving me crazy outputs with lines going everywhere. I tried to look online and found sklearn's example. This is me trying to replicate this:
I thought that maybe, because I have two features, I may need to identify them separately.
def visualise(X, y, predictions):
newY = np.zeros(X.shape[0], X.shape[1]);
newY[:, 0:1] = newY.iloc[:, 0]
plt.scatter(X, y, color='blue')
plt.plot(X, predictions, color='red')
plt.xticks(())
plt.yticks(())
plt.show()
I had to create a newY array since X has two features, y had 1 so the shapes where different. But now I am getting an error at the line newY = np.zeros(X.shape[0], X.shape[1]);
TypeError: data type not understood
Update
def visualise(X, y, predictions):
newY = np.zeros((X.shape[0], X.shape[1]));
newY[:, 0] = y
newY[:, 1] = y
plt.scatter(X, newY, color='blue')
plt.plot(X, predictions, color='red')
Now fixes the error, but this is my output:
How can I plot my scatter graph and plot a line for my predictions?
As you have two features you can't draw a prediction line. If anything you probably want a prediction contour plot.
Your example is much more similar to this two-featured example here https://scikit-learn.org/stable/auto_examples/svm/plot_iris.html