Search code examples
hadoopcascadingpmml

How to pass the Hadoop job queue name in jpmml cascading?


I am trying to execute pmml model using cascading framework in jpmml cascading library provided in this project https://github.com/jpmml/jpmml-cascading

I have followed all the steps and was able to generate the example-1.2-SNAPSHOT-job.jar using mvn clean install command.

However when I am executing the same jar using the below command :

hadoop jar example-1.2-SNAPSHOT-job.jar /tmp/cascading/model.pmml file:///tmp/cascading/input.csv file:///tmp/cascading/output

I am getting below exceptions for not having the rights to submit the job on DEFAULT queue as default queue in our hadoop cluster is blocked for admin purpose only, normal user can not run the hadoop job without providing the queue name.

Exception:
16/01/06 04:41:37 ERROR ipc.FailoverRPC: FailoverProxy: Failing this Call: submitJob for error(RemoteException): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): **User test cannot perform operation SUBMIT_JOB on queue default.**
 Please run "hadoop queue -showacls" command to find the queues you have access to .
    at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:179)
    at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:136)
    at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:113)
    at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:4524)
    at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:481)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2000)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1996)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1994)

I am not able to see where to provide the hadoop job queue in the repository.

Can anyone suggest how can I pass the hadoop job queue name?

Note:

  1. I have tried to pass the property mapred.job.queue.name passing the queue name but still job is showing same error through command line and in the code itself.

  2. I have also tried to run the job using oozie shell action node and passing the queue name there in application workflow but I believe that would be applicable for the oozie job only not the hadoop jobs which gets executed through shell action node.


Solution

  • You are using the wrong settings. You should use the settings for Hadoop 2.x.

    Following configuration properties (present in mapred-site.xml) control the submission to the job queues.

    Hadoop 1.x

    • mapred.acls.enabled: Whether ACL check should be enabled for checking user's privileges while doing a queue operation. It is set to false by default.

    • mapred.job.queue.name: Queue to which a job is submitted. Default value is default.

    Hadoop 2.x

    • mapreduce.cluster.acls.enabled: Whether ACL check should be enabled for checking user's privileges while doing a queue operation. It is set to false by default.

    • mapreduce.job.queuename: Queue to which a job is submitted. Default value is default.

    You can set these values in different ways:

    • Pass with -D option in the command line, while running a job. For e.g. you can pass it as -Dmapreduce.job.queuename=default.
    • Set it in the Driver for the job (Hadoop 2.x):

      Configuration conf = new Configuration();
      conf.set("mapreduce.job.queuename", "default");
      Job job = Job.getInstance(conf, "JobName");
      
    • Set it in the mapred-site.xml file of the cluster.