Search code examples
hadoopmahout

Running mahout kmeans on hadoop multi node cluster


I am running kmeans on a multinode cluster.The input size is about 100mb and I have modified bin/mahout file like this

.

.

.

MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=10MB"

.

.

MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=10"

Over each iteration i get

12/09/12 17:05:02 INFO mapred.JobClient: Launched map tasks=1

12/09/12 17:05:02 INFO mapred.JobClient: Launched reduce tasks=6

12/09/12 17:05:02 INFO mapred.JobClient: Data-local map tasks=1

Does this mean that it runs on single node instead of multi node?And if so what do I miss in the configuration?


Solution

  • Surely you want to set the max split size rather than min, if you want more splits. It is still only a suggestion to the cluster.