Search code examples
hadoopmapreducemahout

I want get distance matrix by mahout mapreduce job


I have these input file

id, feature1, feature2, ...
0, 0, 1, 1, 0, 0, 0, ...
1, 0, 0, 1, 0, 1, 0, ...
2, 1, 0, 0, 0, 0, 0, ...
3, 0, 0, 0, 0, 1, 0, ...

and I want get its distance matrix calculate by hadoop or mahout using mapreduce job. but mahout has no method of calculate distance matrix. What do I do?

Thank you for your help.


Solution

  • You can calculate the distance between each records yourself using Mahout. Use distance method of DistanceMeasure class, but you have to convert the input file into SequenceFile first.