Search code examples
hivesqoopooziehcatalogoozie-workflow

sqoop action with hcatalog in oozie workflow has problem


when i use sqoop export command to export data from hive to mirosoft sql server, I have problem when use sqoop actin with hcatalog in ambary-views.

The following command run in shell correctly and it works so good.

sqoop export --connect 'jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB'  --connection-manager org.apache.sqoop.manager.SQLServerManager --username 'DataApp' --password 'D@t@User' --table tr1 --hcatalog-database temporary --catalog-table 'daily_tr'

but when I create sqoop action with this command in oozie workflow I have an error following:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
        at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
        at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
        at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
        at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
        at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:171)
        at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:153)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
        at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:50)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.hcatalog.mapreduce.HCatOutputFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 27 more

For solve this error I do the following:

  • under folder in which workflow.xml is, i create folder lib and put there all hive jar files from sharedlibDir(/user/oozie/share/lib/lib_201806281525405/hive

My goal was to do that, components recognize hcatalog jar files and classpath so I’m not sure for that, and maybe I shouln’t do that and do different solution for this error

Anyway after do that the error has been changed following:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org.apache.hadoop.hive.shims.HadoopShims.g
etUGIForConf(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.getUGIForConf(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/sec
urity/UserGroupInformation;
        at org.apache.hive.hcatalog.common.HiveClientCache$HiveClientCacheKey.<init>(HiveClientCache.java:201)
        at org.apache.hive.hcatalog.common.HiveClientCache$HiveClientCacheKey.fromHiveConf(HiveClientCache.java:207)
        at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:138)
        at org.apache.hive.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:564)
        at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
        at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
        at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:85)
        at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:63)
        at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:349)
        at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
        at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
        at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
        at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
        at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:171)
        at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:153)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
        at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:50)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)

versions:

HDP 2.6.5.0

yarn 2.7.3

hive 1.2.1000

sqoop 1.4.6

oozie 4.2.0

please help me to solve errors and issue and why the sqoop command work correctly in shell but in oozie workflow has error?


Solution

  • I solved my problem by the following:

    1- use ( --hcatalog-home /usr/hdp/current/hive-webhcat ) in command tag in workflow.xml:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <workflow-app xmlns="uri:oozie:workflow:0.5" name="loadtosql">
        <start to="sqoop_export"/>
        <action name="sqoop_export">
            <sqoop xmlns="uri:oozie:sqoop-action:0.4">
                <job-tracker>${resourceManager}</job-tracker>
                <name-node>${nameNode}</name-node>
                <command>export --connect jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB --connection-manager org.apache.sqoop.manager.SQLServerManager --username DataApp--password D@t@User --table tr1 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database temporary --hcatalog-table daily_tr </command>
                <file>/user/ambari-qa/test/lib/hive-site.xml</file>
                <file>/user/ambari-qa/test/lib/tez-site.xml</file>
            </sqoop>
            <ok to="end"/>
            <error to="kill"/>
        </action>
        <kill name="kill">
            <message>${wf:errorMessage(wf:lastErrorNode())}</message>
        </kill>
        <end name="end"/>
    </workflow-app>
    

    2- on the hdfs create lib folder beside workflow.xml and put hive-site.xml and tez-site.xml to that (upload hive-site.xml from /etc/hive/2.6.5.0-292/0/ and tez-site.xml from /etc/tez/2.6.5.0-292/0/ to lib folder on hdfs)

    according to above in workflow define two files (hive-site.xml and tez-site.xml)

    <file>/user/ambari-qa/test/lib/hive-site.xml</file>
    <file>/user/ambari-qa/test/lib/tez-site.xml</file>
    

    3- define the following property in job.properties file:

    oozie.action.sharelib.for.sqoop=sqoop,hive,hcatalog
    

    4- Make sure oozie-site.xml under /etc/oozie/conf has the following property specified.

    <property> 
        <name>oozie.credentials.credentialclasses</name>
        <value>hcat=org.apache.oozie.action.hadoop.HCatCredentials</value>
    </property>