Search code examples
hadoophadoop-yarnmrv2

YARN: Controlling concurrency of jobs


I've been trying to make use YARNs resource queues to control contention by controlling the number of jobs (I only have MR jobs, no other YARN applications) at any given time. The situation I have is -

I have a service that accepts requests from users and runs some reports (as MR jobs). These jobs can sometimes be time consuming and during peak time, these jobs fight for resources and too much sharing means no single job makes decent progress. I'm trying to minimize the number of reports that can run on a queue at any given time.

I could do part of this by setting the max running apps of a queue to a desired value. Now, I could submit MR apps/jobs to the cluster and only (lets say 'n') jobs run at any given time. Now, the problem, there is no way to preempt tasks in the same queue (Or I don't know of one). I'd like for me being able to submit jobs to this queue in such a way that when there is one job, it occupies all of the queue and when there are 2 jobs, some tasks of first job get killed and both jobs have equal resources and the third job comes along to further divide resources and so on (basically the way FairShareScheduler works with preemption but inside one single queue instead of multiple queues).

Is this possible? I only have one user (my service) submitting the jobs right now. I could propagate the user of my service down to the cluster (which I don't prefer but could do if there is no other go) in order to create sub queues based on user. But then I do not know how to get the behaviour I want since there are many users and I'm not sure how to set a limit (weight) per queue without knowing the name of a queue (which will be created upon job submission).

Thanks in advance for any help.


Solution

  • I found that it is not possible to preempt containers in the same queue. I worked around with compromise.