Search code examples
pythonpandasnumpyolsmultiplelinearregression

Can I use numpy.polyfit(x, y, deg) for multiple linear regression


Is there any way I can fit two independent variables and one dependent variable in numpy.polyfit()?

I have a panda data frame that I loaded from a csv file. I wish to include two columns as independent variables to run multiple linear regression using NumPy.

Currently my simple linear regression looks like this:

model_combined = np.polyfit(data.Exercise, y, 1)

I wish to include data.Age in x as well.


Solution

  • Assuming your equation is a * exercise + b * age + intercept = y, you can fit a multiple linear regression with numpy or scikit-learn as follows:

    from sklearn import linear_model
    import numpy as np
    np.random.seed(42)
    
    X = np.random.randint(low=1, high=10, size=20).reshape(10, 2)
    X = np.c_[X, np.ones(X.shape[0])]  # add intercept
    y = np.random.randint(low=1, high=10, size=10)
    
    # Option 1
    a, b, intercept = np.linalg.pinv((X.T).dot(X)).dot(X.T.dot(y))
    print(a, b, intercept)
    
    # Option 2
    a, b, intercept = np.linalg.lstsq(X,y, rcond=None)[0]
    print(a, b, intercept)
    
    # Option 3
    clf = linear_model.LinearRegression(fit_intercept=False)
    clf.fit(X, y)
    print(clf.coef_)