Search code examples
pythonnumpyregressionlinear-regression

Linear Regression (No Intercept) RSquare using Numpy


For Linear Regression with 1 Variable and an Intercept, I can compute the RSquare as -

R^2 = (np.sum(((x - np.mean(x)) / np.std(x, ddof=1)) * ((y - np.mean(y)) / np.std(y, ddof=1))) / (len(x) - 1)) ** 2

How do I compute R Square for Linear Regression with 1 Variable and without an intercept, and without having to deal with statsmodels.api OLS or linregress or any of the third party packages. Is the understanding correct that np.mean(y) = 0 for Linear Regresssion without intercept?

What is the fastest way in numpy to get the RSquare for Linear Regression with 1 Variable and no intercept?


Solution

  • In the case of one variable with no intercept, you could easily do:

    sum(x*y)**2/sum(x*x)/sum(y*y)
    

    In matrix notation this can be written as

     (y @ x)**2/(x @ x * y @ y)
    

    For example:

    import statsmodels.api as sm
    x, y = sm.datasets.get_rdataset('iris').data.iloc[:,:2].values.T
    print(sm.OLS(y,x).fit().rsquared)
    0.9565098243072627
    
    print((y @ x)**2/(x @ x * y @ y))
    0.9565098243072627
    

    Note that the two are equivalent


    You could extend the above to include multiple variables:

    import statsmodels.api as sm, numpy as np
    
    dat = sm.datasets.get_rdataset('iris').data
    x = dat.iloc[:,1:4].values
    y = dat.iloc[:,0].values
    
    print(sm.OLS(y, x).fit().rsquared)
    0.9961972754365206
    
    print(y @ x @ np.linalg.solve(x.T @ x, x.T @ y)  / (y @ y))
    0.9961972754365208