Why is GCP Dataproc's cluster auto-scaling using YARN as RM based on memory requests and NOT cores? Is it limitation of Dataproc or YARN or am I missing something?
Reference: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/autoscaling
Autoscaling configures Hadoop YARN to schedule jobs based on YARN memory requests, not on YARN core requests.
Autoscaling is centered around the following Hadoop YARN metrics:
Allocated memory refers to the total YARN memory taken up by running containers across the whole cluster. If there are 6 running containers that can use up to 1GB, there is 6GB of allocated memory.
Available memory is YARN memory in the cluster not used by allocated containers. If there is 10GB of memory across all node managers and 6GB of allocated memory, there is 4GB of available memory. If there is available (unused) memory in the cluster, autoscaling may remove workers from the cluster.
Pending memory is the sum of YARN memory requests for pending containers. Pending containers are waiting for space to run in YARN. Pending memory is non-zero only if available memory is zero or too small to allocate to the next container. If there are pending containers, autoscaling may add workers to the cluster.
It's a limitation of Dataproc at the moment. By default, YARN finds slots for containers based on memory requests and ignores core requests entirely. So in the default configuration, Dataproc only needs to autoscale based on YARN pending/available memory.
There are definitely use cases where you want to oversubscribe YARN cores by running more containers. For example our default distcp configs might have 8 low-memory containers running on a node manager even if you have only 4 physical cores. Each distcp task is largely I/O bound and doesn't take much memory. So I think leaving the default of only scheduling based on memory is reasonable.
If you're interested in configuring autoscaling based on YARN cores too, I suspect you've turned on YARN's DominantResourceCalculator to make YARN to schedule both on memory and cores. It is in our roadmap to support DominantResourceCalculator
. But we have been prioritizing autoscaling stability fixes first. Feel free to reach out privately to [email protected] to tell us more about your use case.