Search code examples
mapreducecluster-analysiscode-analysismahoutk-means

Mahout 0.9 K-Means mapReduce analysis of the algorithm


I have been checking the algorithm of Mahout 0.9 k-means using MapReduce and I would like to know where can I check the code of what is happening inside the map function and in the reducer?

I was using debugging using NetBeans and I was not able to find what is exactly implemented in the Map and Reduce functions...

The reason what I am doing this is because I would like to know what is exactly implemented in the version of Mahout 0.9 in order to see which parts where optimized on the K-Means mapReduce algorithm.

If somebody knows which research paper the Mahout K-means were based on, that would also helped me a lot.

Thank you so much!

Best regards!


Solution

  • Download source code for mahout-core. Search for java file org.apache.mahout.clustering.kmeans.KMeansDriver.

    In this java file search for line ClusterIterator.iterateMR(conf, input, priorClustersPath, output, maxIterations);

    iterateMR function in class org.apache.mahout.clustering.iterator.ClusterIterator is the class which defines all configuration required for Map Reduce.

    org.apache.mahout.clustering.iterator.CIMapper and org.apache.mahout.clustering.iterator.CIReducer are the Map reduce classes you are looking for.

    Hope this helps!! :)

    However, I do not know which research paper is implemented.