Search code examples
pythonmatplotlibmachine-learninglinear-regressionvalueerror

Python Multiple linear regression can't plot


I'm trying to run multiple linear regression but i'm having trouble with plotting my results. I'm trying to plot my 3D plot my i get this output ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,) and requested shape (34,)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], Y)
ax.plot(X.iloc[:, 0], X.iloc[:, 1], y_pred, color='red')

ax.set_xlabel('Annual Income (k$)')
ax.set_ylabel('Age')
ax.set_zlabel('Spending Score')
plt.show()

Edited: enter image description here

Edit 2: enter image description here


Solution

  • The plot command should be:

    ax.plot(X_test.iloc[:, 0], X_test.iloc[:, 1], y_pred, color='red')
    

    because y_pred contains y values only for the subset X_test, not the entire input X.

    Plotting with connected lines (ax.plot) doesn't make sense, because the input data is probably not ordered in a meaningful way and the test set is definitely not ordered even if the input data was ordered.

    I would plot it like this:

    enter image description here

    from sklearn.model_selection import train_test_split
    from mpl_toolkits.mplot3d import Axes3D
    import numpy as np
    
    # generate some data as an example.
    np.random.seed(1)
    n = 20
    X = pd.DataFrame(np.random.uniform(size=(n, 2)), columns=['foo', 'bar'])
    Y = X['foo'] + 2*X['bar'] + np.random.normal(scale=0.2, size=n)
    
    X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
    
    from sklearn.linear_model import LinearRegression
    regressor = LinearRegression()
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)
    
    
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(X['foo'], X['bar'], Y, label='data')
    
    for x0, x1, yt, yp in zip(X_test['foo'], X_test['bar'], y_test, y_pred):
        ax.plot([x0, x0], [x1, x1], [yt, yp], color='red')
    
    ax.scatter(X_test['foo'], X_test['bar'], y_pred, color='red', marker='s', label='prediction') 
    
    ax.set_xlabel('X0')
    ax.set_ylabel('X1')
    ax.set_zlabel('y')
    ax.legend()
    fig.show()
    

    There are other ways to do visualization. You could use np.meshgrid to generate X values on a grid and get y values from your predictor and plot it using plot_wireframe and plot both the train and test data using vertical lines to indicate their vertical distance from the wireframe. It depends on the data what makes sense.