Search code examples
pythonscikit-learnlinear-regression

Python Linear Regression Combination Problem


I need to calculate the linear regression and the MSE in groups of two variables of my dataframe. The problem is that I can't compare the xtrain with two variables with the ytrain with one, but I just have a column in my ytrain.

Code:

from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01)

Problem:

from itertools import combinations
for c in combinations(range(4), 2):
    lr=LinearRegression()
    lr.fit(Xtrain[:,c].reshape(-1,1),ytrain)
    yp=lr.predict(Xtest[:,c].reshape(-1,1))
    print('MSE', np.sum((ytest - yp)**2) / len(ytest))

Error:

enter image description here


Solution

  • There is no need to use the reshape method on the feature matrices as they are already two dimensional. If you remove the reshaping your code will work, see below.

    from sklearn.datasets import make_regression
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split
    from itertools import combinations
    import numpy as np
    
    X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01, random_state=42)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    
    for c in combinations(range(4), 2):
    
        lr = LinearRegression()
        lr.fit(X_train[:, c], y_train)
        yp = lr.predict(X_test[:, c])
    
        print('MSE', np.sum((y_test - yp) ** 2) / len(y_test))
    
    # MSE 591.707619290734
    # MSE 33.613143724590564
    # MSE 634.3248475857874
    # MSE 1646.9447686107499
    # MSE 2293.2878076807942
    # MSE 1700.2559702871085