Search code examples
algorithmmatrixleast-squaresmatrix-inverse

Linear Regression - What algorithm to use to solve least squares method - inverse or LU or ...?


I am working on algorithm to perform linear regression for one or more independent variables.

that is: (if I have m real world values and in the case of two independent variables a and b)

C + D*a1 + E* b1 = y1

C + D*a2 + E* b2 = y2

...

C + D*am + E* bm = ym

I would like to use the least squares solution to find best fitting straight line.

I will be using the matrix notation so

enter image description here

where Beta is the vector [C, D, E] where these values will be the best fit line.

Question What is the best way to solve this formula? Should I compute the inverse of enter image description here

or should I use the LU factorization/decmposition of the matrix. What is the performance of each on large amount of data (i.e a big value of m , could be in order of 10^8 ...)

EDIT

If the answer was to use Cholesky decomposition or QR decomposition, are there any implementation hints/ simple libraries to use. I am coding in C/ C++.


Solution

  • Your X^TX matrix should have a Cholesky decomposition. I'd look into this decomposition before LU. It is faster: http://en.wikipedia.org/wiki/Cholesky_decomposition