Search code examples
hadoopmapreducemahout

Why Mahout doesn't yet have Linear Regression


I am just starting to work with Mahout, and one thing which perplexed me a great deal is the lack of Linear Regression. Even logistic regression, which is much harder, is supported to some degree with research going on, but it's all silent on linear regression front!

From what I understand, OLS is one of the easiest problems to solve -

Y = Xb + e

has a linear regression solution of b = (X^T X)^(-1) X^T Y, where X^T is transpose of X, and if the matrix (X^T X) turns out singular (i.e. not invertible) then it's perfectly fine to show error message even though a solution using generalized inverse exists.

Computation of both X^T X and X^ Y are just computations of sums and sum of products of elements, which is probably the easiest thing to do with MapReduce as I understand.

(Which makes me think... is there any module that supports native matrix operations required to compute regression cofficients? That would make a regression module unnecessary indeed...)

Am I missing something which makes regression hard to compute in Mahout?


Solution

  • I don't know if there's a "why" to things like this. It just doesn't exist.

    However I think it's the opposite of what you suppose; it's too "easy". Unless you're solving a solution of a ten million equations, it's probably not of a scale that Hadoop is called for. There are plenty of existing packages that can do this really well on one machine. If you want something also in Java from Apache just look at Commons Math for example.

    Not to say there couldn't be a fine non-distributed version in the project, but since the emphasis is mostly big-scale and Hadoop, that's probably "why".