I'm working on a project using oozie to schedule Hadoop jobs. But recently, oozie throws java.lang.ClassNotFoundException from time to time. I checked the error log, pretty sure put all needed jar files in hdfs under directory lib. The following is hadoop task logs, the last 10 lines shows jar files I needed. But when I check the distcache direcotry on the node, it's empty. It doesn't happen all the time, only some hours after last run of this workflow. So I suspect that hadoop cleaned distcache, and didn't copy jar file to distcache direcotry next time. But oozie will include same direcotry in classpath which is empty. Does anybody encounter the same problem? I can't think a better solution for this.
I'm using oozie 3.2.0-incubating with hadoop 1.1.1
Classpath :
------------------------
/home/workspace/hadoop/libexec/../conf
/usr/java/default/lib/tools.jar
/* some jars from hadoop */
/home/data7/mapred_tmp/taskTracker/distcache/-6071601324996771729_2013238955_873176406/localhost/user/supertool/oozie-supe/0000232-140509184943733-oozie-supe-W/begin--java/java-launcher.jar
/home/data9/mapred_tmp/taskTracker/distcache/-4677386048903657010_1227144840_1337300706/localhost/user/supertool/plannex/app/schedule/lib/mysql-connector-java-5.1.29-bin.jar
/home/data10/mapred_tmp/taskTracker/distcache/-8328135876058302714_-1519042818_64290738/localhost/user/supertool/plannex/app/schedule/lib/plannex-schedule-2.0.0-SNAPSHOT-jar-with-dependencies.jar
/home/data11/mapred_tmp/taskTracker/distcache/-3456058783425455308_886532069_1155570996/localhost/user/supertool/plannex/app/schedule/lib/postgresql-9.1-903.jdbc3.jar
/home/data12/mapred_tmp/taskTracker/distcache/7890488265085818377_2040166227_64563179/localhost/user/supertool/plannex/app/schedule/lib/sqoop-1.4.4.jar
/home/data9/mapred_tmp/taskTracker/distcache/-4677386048903657010_1227144840_1337300706/localhost/user/supertool/plannex/app/schedule/lib/mysql-connector-java-5.1.29-bin.jar
/home/data10/mapred_tmp/taskTracker/distcache/-8328135876058302714_-1519042818_64290738/localhost/user/supertool/plannex/app/schedule/lib/plannex-schedule-2.0.0-SNAPSHOT-jar-with-dependencies.jar
/home/data11/mapred_tmp/taskTracker/distcache/-3456058783425455308_886532069_1155570996/localhost/user/supertool/plannex/app/schedule/lib/postgresql-9.1-903.jdbc3.jar
/home/data12/mapred_tmp/taskTracker/distcache/7890488265085818377_2040166227_64563179/localhost/user/supertool/plannex/app/schedule/lib/sqoop-1.4.4.jar
/home/data3/mapred_tmp/taskTracker/supertool/jobcache/job_201405231920_0043/attempt_201405231920_0043_m_000000_0/work
If its a map reduce job use "-libjars" option to copy files every time to distributed cache. You can point to the hdfs locations as well.