Mappers distribution in an EMR cluster

How does EMR prioritize core and task nodes while distributing mappers? Does it even matter?

Example: A sample job requires 5 mappers. Core nodes and task nodes independently can handle 5 mappers. Will the core nodes get all the 5 mappers OR will the task nodes get all the 5 mappers? Or is it a mix-and-match (based on proprietary EMR algorithms)?

Solution

EMR does not currently do anything special here regarding placing mappers on CORE or TASK instances. However, Hadoop will by default attempt to honor data locality, which means that if your mappers are reading from HDFS, the mappers might be more likely to run on CORE instances (which run HDFS) than on TASK instances (which do not run HDFS--this is currently the only difference between CORE and TASK instances).