I have been checking the algorithm of Mahout 0.9 k-means using MapReduce and I would like to know where can I check the code of what is happening inside the map function and in the reducer?
I was using debugging using NetBeans and I was not able to find what is exactly implemented in the Map and Reduce functions...
The reason what I am doing this is because I would like to know what is exactly implemented in the version of Mahout 0.9 in order to see which parts where optimized on the K-Means mapReduce algorithm.
If somebody knows which research paper the Mahout K-means were based on, that would also helped me a lot.
Thank you so much!
Best regards!
Download source code for mahout-core. Search for java file org.apache.mahout.clustering.kmeans.KMeansDriver
.
In this java file search for line ClusterIterator.iterateMR(conf, input, priorClustersPath, output, maxIterations);
iterateMR
function in class org.apache.mahout.clustering.iterator.ClusterIterator
is the class which defines all configuration required for Map Reduce.
org.apache.mahout.clustering.iterator.CIMapper
and org.apache.mahout.clustering.iterator.CIReducer
are the Map reduce classes you are looking for.
Hope this helps!! :)
However, I do not know which research paper is implemented.