I am wondering whether the number of mappers and reducers would be different based on the instance type of EC2 servers you choose? I found Large instance is using 3 mappers and 1 reducers. Would that be the same for every other type (for example, xLarge instance)? I know I can override it thru bootstraping but just wondering.
No, it isn't same for every instance types. Amazon has a concept of Hadoop Default Configurations, which is controlled by AMI versions, latest one is AMI-2.3 . These configurations define the default value for a number of hadoop configurations, for example for a m1.xlarge, following configurations are set by default of you use AMI-2.3
Parameter Value
HADOOP_JOBTRACKER_HEAPSIZE 6912
HADOOP_NAMENODE_HEAPSIZE 2304
HADOOP_TASKTRACKER_HEAPSIZE 384
HADOOP_DATANODE_HEAPSIZE 384
mapred.child.java.opts -Xmx768m
mapred.tasktracker.map.tasks.maximum 8
mapred.tasktracker.reduce.tasks.maximum 3
For more see the following: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HadoopMemoryDefault_AMI2.3.html http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config.html