Search code examples
pythonscikit-learnlinear-regression

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 13 is different from 1)


I was solving the problem of boston house price with linear regression using sklearn. An error like this occurred along the way:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 13 is different from 1)

Code:

import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

X = boston.data
y = boston.data

dfX = pd.DataFrame(X, columns = boston.feature_names)
dfy = pd.DataFrame(y, columns = ["Price"] )
df = pd.concat([dfX,dfy],axis =1)

reg = LinearRegression()
reg.fit(X,y)

x_12 = np.array(dfX["LSTAT"]).reshape(-1,1)  # 12th data in boston.data
y = np.array(dfy["Price"]).reshape(-1,1)

predict = reg.predict(x_12) > Error code

Solution

  • The error seems to be due to LinearRegression's function fit is used on all 13 features of load_boston dataset but when using predict, you only use 1 feature (LSTAT). This seems to cause conflict between the trained model and predict input data. You probably need to update your fit function so that it only takes in LSTAT feature so that it will only expect one feature as input data when using predict

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_boston
    from sklearn.linear_model import LinearRegression
    
    boston = load_boston()
    X, y = load_boston(return_X_y=True)
    
    # X will now have only data from "LSTAT" column
    X = X[:, np.newaxis, boston.feature_names.tolist().index("LSTAT")]
    
    dfX = pd.DataFrame(X, columns = ["LSTAT"] )
    dfy = pd.DataFrame(y, columns = ["Price"] )
    df = pd.concat([dfX,dfy],axis =1)
    
    reg = LinearRegression()
    reg.fit(X, y)
    
    x_12 = np.array(dfX["LSTAT"]).reshape(-1, 1)  # 12th data in boston.data
    y = np.array(dfy["Price"]).reshape(-1, 1)
    
    predict = reg.predict(x_12)