Search code examples
hadoopmapreducehadoop-yarnapache-tezplanning

Suggestions required in increasing utilization of yarn containers on our discovery cluster


Current Setup

  • we have our 10 node discovery cluster.
  • Each node of this cluster has 24 cores and 264 GB ram Keeping some memory and CPU aside for background processes, we are planning to use 240 GB memory.
  • now, when it comes to container set up, as each container may need 1 core, so max we can have 24 containers, each with 10GB memory.
  • Usually clusters have containers with 1-2 GB memory but we are restricted with the available cores we have with us or maybe I am missing something

Problem statement

  • as our cluster is extensively used by data scientists and analysts, having just 24 containers does not suffice. This leads to heavy resource contention.

  • Is there any way we can increase number of containers?

Options we are considering

  • If we ask the team to run many tez queries (not separately) but in a file, then at max we will keep one container.

Requests

  1. Is there any other way possible to manage our discovery cluster.
  2. Is there any possibility of reducing container size.
  3. can a vcore (as it's a logical concept) be shared by multiple containers?

Solution

  • Vcores are just a logical unit and not in anyway related to a CPU core unless you are using YARN with CGroups and have yarn.nodemanager.resource.percentage-physical-cpu-limit enabled. Most tasks are rarely CPU-bound but more typically network I/O bound. So if you were to look at your cluster's overall CPU utilization and memory utilization, you should be able to resize your containers based on the wasted (spare) capacity.

    You can measure utilization with a host of tools but sar, ganglia and grafana are the obvious ones but you can also look at Brendan Gregg's Linux Performance tools for more ideas.