Search code examples
pythonscipystatisticsregressiongaussian

Getting a series of Normal Distributions from a Least Squares regression


I am not particularly good at math. I would like to get some breadcrumbs about how to solve the following formula using python code.

  • Assume an [m,n] matrix M and a [1,n] vector y.
  • Solve for the least squares using scipy.linalg.lstsq(M, y).
  • The output will be an [m,1] vector of coefficients a in the equation Ma=y.

As per this question, any vector of solutions like a in a regression is basically a series of single points each taken from a normal distribution that describes the error of every point on the regression. In effect, every single digit in the solution vector a is the mean of a normal distribution of errors centred on zero.

I would like to find those normal distributions rather than the scalar value for every single point in the solution. Apologies for the poor description of the mathy bits, I was never trained in math in Uni.


Solution

  • Here is a hint. Let me know if you want more.

    scipy.linalg.lstsq(M, y) returns four things:

    x : (N,) or (N, K) ndarray
      Least-squares solution.
    
    residues : (K,) ndarray or float
      Square of the 2-norm for each column in b - a x, if M > N and ndim(A) == n
      (returns a scalar if b is 1-D). Otherwise a (0,)-shaped array is returned.
    
    rank : int
      Effective rank of a.
    
    s : (min(M, N),) ndarray or None
      Singular values of a. The condition number of a is s[0] / s[-1].
    

    residues is going to be of interest to you!

    https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html