How to increase number of reducer in canopy clustering algorithm

I'm running canopy clustering algorithm using mahout.

This is the command I'm running through mahout Command line.

mahout canopy -i /mahout/o_seqsparse/tfidf-vectors -o /mahout/o_canopy -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -ow -t1 100 -t2 50

Below is number of map & reduce task running:

No. of map tasks runing --> 6

No. of reduce tasks runing --> 1

But this is taking too much time because of one reducer. I think, if I am able to increase the number of reduce tasks, then I will get better performance.

I also tried with increasing map reduce with mapred-site.xml file mapred.map.tasks, mapred.reduce.tasks But this has no effect, still it is running with 1 reduce.

Solution

You didnt specify the version of mahout you are using. But looking at the source code of 0.4 here: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.4/org/apache/mahout/clustering/canopy/CanopyDriver.java

You can find 1 reducer is hard coded. I dont think you can override it.

EDIT

For version 0.9 as you specified check here http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.9/org/apache/mahout/clustering/canopy/CanopyDriver.java/ at line no. 354

job.setNumReduceTasks(1);

Modify this and build again. However the map output must be sent to one reducer. In case of clustering I dont believe you will benefit much by increasing the number of reducers.