pythonscipystatisticsregressiongaussian

# Getting a series of Normal Distributions from a Least Squares regression

I am not particularly good at math. I would like to get some breadcrumbs about how to solve the following formula using python code.

• Assume an [m,n] matrix M and a [1,n] vector y.
• Solve for the least squares using scipy.linalg.lstsq(M, y).
• The output will be an [m,1] vector of coefficients a in the equation Ma=y.

As per this question, any vector of solutions like a in a regression is basically a series of single points each taken from a normal distribution that describes the error of every point on the regression. In effect, every single digit in the solution vector a is the mean of a normal distribution of errors centred on zero.

I would like to find those normal distributions rather than the scalar value for every single point in the solution. Apologies for the poor description of the mathy bits, I was never trained in math in Uni.

Solution

• Here is a hint. Let me know if you want more.

scipy.linalg.lstsq(M, y) returns four things:

x : (N,) or (N, K) ndarray
Least-squares solution.

residues : (K,) ndarray or float
Square of the 2-norm for each column in b - a x, if M > N and ndim(A) == n
(returns a scalar if b is 1-D). Otherwise a (0,)-shaped array is returned.

rank : int
Effective rank of a.

s : (min(M, N),) ndarray or None
Singular values of a. The condition number of a is s[0] / s[-1].

residues is going to be of interest to you!

https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html