Search code examples
hadoopapache-pighadoop-yarn

How to reserve yarn containers for high priority processing - pig jobs


I process pig jobs with hadoop 2.4.1 and Yarn. Some of my pig jobs are high priority (they should run in less than 20 minutes). I'm looking for a PIG or YARN option to reserve yarn containers for my high priority jobs. Is-there a way to do it ?

Right now, I always dependent to other running jobs and according to size of jobs, my priority jobs can wait hours.

Thanks, Romain


Solution

  • You can use the Fair Scheduler for this.

    Fair Scheduler organizes your apps into "queues", and then shares resources fairly between these queues. In addition to providing fair sharing, it allows assigning guaranteed minimum shares to queues, which helps in ensuring that certain queues always get sufficient resources. You can also assign different weights to different queues etc.

    To use the fair scheduler, put the following in your yarn-site.xml.

    <property>
      <name>yarn.resourcemanager.scheduler.class</name>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
    </property>
    

    To set up various queues you will need to create an allocation file fair-scheduler.xml and put it in the hadoop conf directory. You can find the format of allocation file and some more information here: http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

    In your case, you would want to create a separate queue for your high priority jobs. Assign that queue a minimum share such that given this share those jobs complete in the required amount of time. You may also want to set yarn.scheduler.fair.preemption to true to ensure that the scheduler preempts already running job to make sure your queue gets its minimum share.