Search code examples
hadoopmatrixmapreducemahoutmultiplication

How to increase number of mappers in Mahout MatrixMultiplicationJob?


I am using Mahout 0.7's MatrixMultiplicationJob for multiplying a large matrix. But it always uses 1 map task which makes it slow. its probably due to the InputSplit which forces the number of mappers to be 1.

Is there a way I can efficiently multiply matrices in Hadoop / Mahout or change the number of mappers?


Solution

  • Ultimately, it is Hadoop that decides how many mappers to use. Generally it will use one mapper per HDFS block (typically 64 or 128MB). If your data is smaller than that, it's too small to bother with more than 1 mapper.

    You can encourage it to use more anyway by setting mapred.max.split.size to something smaller than 64MB (remember the value is set in bytes, not MB). But, are you sure you want to? It is much more common to need more reducers, not mappers, since Hadoop will never use more than 1 unless you (or your job) tells it to.

    Also know that Hadoop will not be able to use more than one mapper on a single compressed file. So if your input is one huge compressed file, it will only ever use 1 mapper on that file. You can however split it up yourself into many smaller compressed files.