I am attempting machine learning in Mahout, Java. I have downloaded all the data I want with MySQL. The point where I am stuck is when my variable of type "SparseRowMatrix" has all computations and rearrangements done. I simply do not understand how to call any of the two methods I can see fit:
1) org.apache.mahout.math.decomposer.lanczos.LanczosSolver
2) org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
Any advice is helpful at this point!
DistributedLanczosSolver
implements the Tool
interface, so you can run it as a regular Hadoop job, for example:
hadoop jar $MAHOUT_HOME/mahout-examples-0.5-job.jar org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver --input /path/to/input --output /path/to/output --numCols 42 --numRows 42 --cleansvd "true" --rank 5
You could also call this directly from Java using:
ToolRunner.run(new DistributedLanczosSolver().job(), args);
Alternatively if you don't need to do it in a distributed fashion, the LanczosSolver.solve
method is what you are looking for, you have to pass it your matrix, your eigen vectors and eigen values. It does some complicated stuff behind the hood using Lanczos algorithm which I wouldn't be able to explain, so I recommend you look directly for more clarity in the source code.