Search code examples
hadoopcloudera-cdhhue

CDH 5.4.4 Oozie unable to run Sqoop action - ClassNotFound SqoopMain


ClassNotFound SqoopMain

Cloudera Community Post

I'm trying to run a simple Sqoop Action through Oozie on Cloudera 5.4.x (Through their QuickStart VM, which should be pre-configured correctly I assume?)

When I run the import command via the Sqoop CLI, it all works fine. However, when I attempt to run that same command using an Oozie workflow (through Hue), it fails to find the SqoopMain class.

Error log

2015-07-14 14:58:02,997 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[quickstart.cloudera] USER[cloudera] GROUP[-] TOKEN[] APP[simpleWF] JOB[0000001-150714084022371-oozie-oozi-W] ACTION[0000001-150714084022371-oozie-oozi-W@sqoop-import] [***0000001-150714084022371-oozie-oozi-W@sqoop-import***]Action updated in DB!
2015-07-14 14:58:12,802 INFO org.apache.oozie.servlet.CallbackServlet: SERVER[quickstart.cloudera] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-150714084022371-oozie-oozi-W] ACTION[0000001-150714084022371-oozie-oozi-W@sqoop-import] callback for action [0000001-150714084022371-oozie-oozi-W@sqoop-import]
2015-07-14 14:58:13,058 INFO org.apache.oozie.action.hadoop.SqoopActionExecutor: SERVER[quickstart.cloudera] USER[cloudera] GROUP[-] TOKEN[] APP[simpleWF] JOB[0000001-150714084022371-oozie-oozi-W] ACTION[0000001-150714084022371-oozie-oozi-W@sqoop-import] action completed, external ID [job_1436888351169_0003]
2015-07-14 14:58:13,078 WARN org.apache.oozie.action.hadoop.SqoopActionExecutor: SERVER[quickstart.cloudera] USER[cloudera] GROUP[-] TOKEN[] APP[simpleWF] JOB[0000001-150714084022371-oozie-oozi-W] ACTION[0000001-150714084022371-oozie-oozi-W@sqoop-import] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
2015-07-14 14:58:13,085 WARN org.apache.oozie.action.hadoop.SqoopActionExecutor: SERVER[quickstart.cloudera] USER[cloudera] GROUP[-] TOKEN[] APP[simpleWF] JOB[0000001-150714084022371-oozie-oozi-W] ACTION[0000001-150714084022371-oozie-oozi-W@sqoop-import] Launcher exception: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
    ... 13 more

Workflow action

<action name="sqoop-import">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
       <job-tracker>${jobTracker}</job-tracker>
       <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="${nameNode}/tmp/etl/${etlUser}/vet_product_categories"/>
        </prepare>
        <arg>import</arg>
        <arg>--connect</arg>
        <arg>jdbc:mysql://${oltpHost}/${oltpName}</arg>
        <arg>--username</arg>
        <arg>${oltpUser}</arg>
        <arg>--password</arg>
        <arg>${oltpPassword}</arg>
        <arg>--table</arg>
        <arg>view_et_product_categories</arg>
        <arg>--target-dir</arg>
        <arg>/tmp/etl/${etlUser}/vet_product_categories</arg>
        <arg>--as-avrodatafile</arg>
        <arg>-m</arg>
        <arg>1</arg>
    </sqoop>
    <ok to="done"/>
    <error to="fail"/>
</action>

Update 1

Looked up oozie.service.WorkflowAppService.system.libpath in the cloudera manager, and it was set to /user/oozie - it appends the share/lib to whatever you put in this field, so the full path was /usr/oozie/share/lib.

The folder in HDFS is versioned with a timestamp. - I'm not sure how oozie adds these classes to the classpath or if it needs additional help to pick this up: /user/oozie/share/lib/lib_20150609033900


Solution

  • As it turns out, you need to supply a job.properties with oozie.use.system.libpath=true

    It seems that putting this in the workflow.xml does not work.

    Basically, what I have now is a bare sqoop action with no configuration and a job.properties with all properties I need.