python matplotlib machine-learning linear-regression valueerror

Python Multiple linear regression can't plot

I'm trying to run multiple linear regression but i'm having trouble with plotting my results. I'm trying to plot my 3D plot my i get this output ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,) and requested shape (34,)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], Y)
ax.plot(X.iloc[:, 0], X.iloc[:, 1], y_pred, color='red')

ax.set_xlabel('Annual Income (k$)')
ax.set_ylabel('Age')
ax.set_zlabel('Spending Score')
plt.show()

Edited:

Edit 2:

Solution

The plot command should be:

ax.plot(X_test.iloc[:, 0], X_test.iloc[:, 1], y_pred, color='red')

because y_pred contains y values only for the subset X_test, not the entire input X.

Plotting with connected lines (ax.plot) doesn't make sense, because the input data is probably not ordered in a meaningful way and the test set is definitely not ordered even if the input data was ordered.

I would plot it like this:

from sklearn.model_selection import train_test_split
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

# generate some data as an example.
np.random.seed(1)
n = 20
X = pd.DataFrame(np.random.uniform(size=(n, 2)), columns=['foo', 'bar'])
Y = X['foo'] + 2*X['bar'] + np.random.normal(scale=0.2, size=n)

X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['foo'], X['bar'], Y, label='data')

for x0, x1, yt, yp in zip(X_test['foo'], X_test['bar'], y_test, y_pred):
    ax.plot([x0, x0], [x1, x1], [yt, yp], color='red')

ax.scatter(X_test['foo'], X_test['bar'], y_pred, color='red', marker='s', label='prediction') 

ax.set_xlabel('X0')
ax.set_ylabel('X1')
ax.set_zlabel('y')
ax.legend()
fig.show()

There are other ways to do visualization. You could use np.meshgrid to generate X values on a grid and get y values from your predictor and plot it using plot_wireframe and plot both the train and test data using vertical lines to indicate their vertical distance from the wireframe. It depends on the data what makes sense.