hadoop mapreduce google-compute-engine google-hadoop

What is the number of reducer slots on GCE Hadoop worker nodes?

I am testing the scaling of some MapReduce jobs on Google Compute Engine's Hadoop cluster, and finding some unexpected results. In short, I've been told this behavior may be explained by a having a multiple number of reducer slots per each worker node in the Hadoop cluster.

Can someone confirm the number of reducer slots per worker node (worker VM) for MapReduce jobs on GCE's Hadoop cluster? I am using the hadoop2_env.sh deployment.

https://groups.google.com/a/cloudera.org/forum/#!topic/oryx-user/AFIU2PE2g8o provides a link to a background discussion about the behavior I am experiencing, for additional details if desired.

Thanks!

Solution

In bdutil, the number of reduce slots is a function of the total number of cores on the machine and the environment variable CORES_PER_REDUCE_TASK, applied inside configure_hadoop.sh:

export NUM_CORES="$(grep -c processor /proc/cpuinfo)"
export MAP_SLOTS=$(python -c "print int(${NUM_CORES} // \
    ${CORES_PER_MAP_TASK})")
export REDUCE_SLOTS=$(python -c "print int(${NUM_CORES} // \
    ${CORES_PER_REDUCE_TASK})")

<...>

# MapReduce v2 (and YARN) Configuration
if [[ -x configure_mrv2_mem.py ]]; then
  TEMP_ENV_FILE=$(mktemp /tmp/mrv2_XXX_tmp_env.sh)
  ./configure_mrv2_mem.py \
      --output_file ${TEMP_ENV_FILE} \
      --total_memory ${TOTAL_MEM} \
      --available_memory_ratio ${NODEMANAGER_MEMORY_FRACTION} \
      --total_cores ${NUM_CORES} \
      --cores_per_map ${CORES_PER_MAP_TASK} \
      --cores_per_reduce ${CORES_PER_REDUCE_TASK} \
      --cores_per_app_master ${CORES_PER_APP_MASTER}
  source ${TEMP_ENV_FILE}
  # Leave TMP_ENV_FILE around for debugging purposes.
fi

You can see in hadoop2_env.sh that the default is 2 cores per reduce slot:

CORES_PER_REDUCE_TASK=2.0

Optimal settings may vary based on workload, but for most cases these default settings should be fine. As mentioned in the thread you linked, the general approach you can follow is in your actual workload, to set computation-layer.parallelism approximately equal to the number of reduce slots you have. If you're using default settings, just take the number of machines you have, times cores-per-machine, divide by 2 to know the number of slots. If you want 1 reduce slot per machine, set CORES_PER_REDUCE_TASK equal to cores-per-machine.

I say approximately because there are additional advanced settings around setting the number of reduce tasks in your job, including "speculative execution" settings; a typical recommendation is to set your reduce parallelism to a bit less, perhaps 0.95 times the number of reduce slots; this allows a bit of headroom for failed or stuck reduce tasks.

Additionally, you may have seen some cases of faster performance when you increased the parallelism beyond the number of reduce slots, despite the expected slowdown due to needing to perform multiple "waves", due to large variance in speed of different reduce tasks. In certain workloads where there is high variance, the second "wave" can effectively run concurrently with the slowest tasks of the first "wave"; previously the Hadoop wiki gave a rule of thumb of setting reduce parallelism to either 0.95 or 1.75 times the number of available reduce slots. Here's some further discussion on the topic; the poster there correctly points out that these are only applicable for single-tenant clusters.

If you actually want to share a large cluster concurrently with lots of users, these rules of thumb don't apply, since you should then assign parallelism purely based on the size and characteristics of your workload, since you wouldn't want to hog 100% of the cluster resources. However, the recommended approach in a cloud environment is indeed to have multiple, smaller single-tenant clusters, since you can then tune each cluster specifically for the workloads you want and don't need to worry about resource-packing among lots of different use cases.