I'm trying to run multiple linear regression but i'm having trouble with plotting my results. I'm trying to plot my 3D plot my i get this output ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,) and requested shape (34,)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], Y)
ax.plot(X.iloc[:, 0], X.iloc[:, 1], y_pred, color='red')
ax.set_xlabel('Annual Income (k$)')
ax.set_ylabel('Age')
ax.set_zlabel('Spending Score')
plt.show()
The plot command should be:
ax.plot(X_test.iloc[:, 0], X_test.iloc[:, 1], y_pred, color='red')
because y_pred
contains y values only for the subset X_test
, not the entire input X
.
Plotting with connected lines (ax.plot
) doesn't make sense, because the input data is probably not ordered in a meaningful way and the test set is definitely not ordered even if the input data was ordered.
I would plot it like this:
from sklearn.model_selection import train_test_split
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# generate some data as an example.
np.random.seed(1)
n = 20
X = pd.DataFrame(np.random.uniform(size=(n, 2)), columns=['foo', 'bar'])
Y = X['foo'] + 2*X['bar'] + np.random.normal(scale=0.2, size=n)
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['foo'], X['bar'], Y, label='data')
for x0, x1, yt, yp in zip(X_test['foo'], X_test['bar'], y_test, y_pred):
ax.plot([x0, x0], [x1, x1], [yt, yp], color='red')
ax.scatter(X_test['foo'], X_test['bar'], y_pred, color='red', marker='s', label='prediction')
ax.set_xlabel('X0')
ax.set_ylabel('X1')
ax.set_zlabel('y')
ax.legend()
fig.show()
There are other ways to do visualization. You could use np.meshgrid
to generate X
values on a grid and get y
values from your predictor and plot it using plot_wireframe
and plot both the train and test data using vertical lines to indicate their vertical distance from the wireframe. It depends on the data what makes sense.