Search code examples
matlaboctaveregressionlinear-regressionleast-squares

Multivar linear regression should be mathematically undetermined (Octave)


I apologize in advance for the rather abstract nature of my question, but it is indirectly a question about programming algorithms, and I don't think I'll be the only programmer to wonder about this.

This is about the implementation of the multi-variable ordinary least squares (OLS) regression algorithm in Octave (and, I assume, in MatLab as well). As far as I can tell, if one inputs two variables into a linear regression with just one single measurement, the result (i.e. the coefficients) should be mathematically undetermined: unless you accept black magic as a valid premise, how could one possibly tell in which way each of the variables affects the final result? In the more general case, the number of measurements must (I think) be at least equal to the number of variables for the resulting coefficients to make any sense (let alone statistical errors and all that).

Octave, however, is all too happy to compute a result, with no warnings whatsoever:

octave:1> ols([1], [1, 1])
ans =

   0.50000
   0.50000

In other words -- if I got this right -- given the equation 1 = x + y, Octave joyfully concludes that x = y = 0.5.

As such, assuming (as I am) that Octave has no direct connection to Satan, here are my questions:

  1. Am I misunderstanding the mathematical foundation? In other words, is this possibly a legitimate result?
  2. If I'm right, why isn't Octave spitting an error -- or, at the very least, quite a stern warning regarding the totally moronic data I'm asking it to analyze?

Solution

  • take a look at this Octave documentation:

    http://www.gnu.org/software/octave/doc/interpreter/Linear-Least-Squares.html

    In the description of output beta, it says that the value will be the pseudo-inverse of x times y when the matrix is not of full-rank (as is your case for matrix [1, 1]. [0.5; 0.5] is the pseudo-inverse of [1, 1].

    Hope that helps!