Search code examples
hadoopworkflowsqoopoozie

Oozie Workflow uses incorrect user directory


I'm trying to execute an oozie workflow that was written be a colleague. I execute this command:

oozie job -config ./job.properties -run

I have set up parameters in job.properties, including my user.name, and I can see those values being used in the workflow when I examine the logs - creating files in my hdfs directory (e.g. exportDir=/user/${user.name}/ra_export). But at some point in the workflow, it fails with permission errors, because it attempts to modify something in my colleague's directory. It's acting as if ${user.name} was cached somewhere, and is using an old value. Has anyone seen behavior like this, and if so, what's the solution?

Update:

Here's the failing portion of the log:

  1215755 [main] INFO  org.apache.hadoop.hive.ql.exec.FileSinkOperator  - Moving tmp dir: hdfs://hadoop-name-01.mycompany.com:8020/tmp/hive-staging_hive_2015-08-06_19-51-57_511_3052536268795125086-1/_tmp.-ext-10000 to: hdfs://hadoop-name-01.mycompany.com:8020/tmp/hive-staging_hive_2015-08-06_19-51-57_511_3052536268795125086-1/-ext-10000
  1215761 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.MOVE.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
  1215762 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Loading data to table client_reporting.campaign_web_events_export from hdfs://hadoop-name-01.mycompany.com:8020/tmp/hive-staging_hive_2015-08-06_19-51-57_511_3052536268795125086-1/-ext-10000
  1215821 [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Failed with exception Permission denied: user=clark.bremer, access=WRITE, inode="/user/john.smith/ra_export":john.smith:john.smith:drwxr-xr-x

But I can see from the top of the same log that the job.properties variable substitutions are taking place successfully:

  Starting the execution of prepare actions
  Deletion of path hdfs://hadoop-name-01.mycompany.com:8020/user/clark.bremer/foo_export succeeded.
  Creating directory at /user/clark.bremer/foo_export succeeded.
  Completed the execution of prepare actions successfully

But as you can see in the failing portion of the log, it's using both the wrong username (john.smith instead of clark.bremer), and the wrong export directory (ra_export instead of foo_export). John used ra_export the last time he ran this workflow.

Here's a portion of my job.properties file:

user.name=clark.bremer
jobTracker=hadoop-name-01.mycompany.com:8032
nameNode=hdfs://hadoop-name-01.mycompany.com:8020
exportDir=/user/${user.name}/foo_export

And here's some snippets from the query that creates the table:

 CREATE EXTERNAL TABLE IF NOT EXISTS client_reporting.campaign_web_events_export
        ....
 stored as textfile location '${EXPORTDIR}/campaign_web_events';
 insert overwrite table client_reporting.campaign_web_events_export

Where EXPORTDIR is in my user directory.


Solution

  • The Hive Table which you are trying to access, have you checked, which User have been created the Hive Table.

    can you drop the existing Hive Table and create a new table with your user, and run the same job and check the status