Search code examples
ooziecloudera-cdhsqoop2cloudera-quickstart-vm

Oozie Shared Lib: where to place jars


I have installed Cloudera CDH QuickStart VM 5.5, and I'm running a Sqoop action in my Oozie workflow. I encountered an error that says MySQL JDBC driver is missing and I came across to a SO answer here that says the mysql-connector-java.jar should be placed in Oozie's HDFS shared lib path, under sqoop path.

When I browse the Oozie's HDFS shared lib path, however, I've noticed two sqoop subdirectories to copy the jar.

/user/oozie/share/lib/sqoop

and

/user/oozie/share/lib/lib_20151118030154/sqoop

Aside from sqoop, hive, pig, distcp, and mapreduce-streaming paths also exist on both lib and lib/lib_20151118030154.

So the question is: where do I place my connector jar: on the first or the second one?

What's the difference (or difference of purpose) of these two paths in relation to jars of sqoop, hive, pig, distcp, and mapreduce-streaming for Oozie?


Solution

  • The lib_20151118030154 sub-dir would be the current version of the ShareLibs, as of 18-NOV-2015. The versioning allows you to make updates without stopping the Oozie service -- check the documentation here.

    In other words: the Oozie service keeps in memory a list of the JARs in each ShareLib (based on what was present for the latest version at boot time), so that adding a JAR will not make a difference until (a) you stop/restart the service or (b) you resync the service as explained in the doc above.