Search code examples
shellhadoopftphdfsoozie

Oozie run shell scripts on random nodes


I wrote smth like custom oozie FTP action (simple example described in "Professional Hadoop Solutions By: Boris Lublinsky; Kevin T. Smith; Alexey Yakubovich"). We have HDFS on node1 and Oozie server on node2. Node2 also has HDFS client.

My problem:

  1. Oozie job started from node1 (All needed files located on HDFS on node1).
  2. Oozie custom FTP action successfully downloaded CSV files from FTP on node2 (oozie server located)
  3. I should pass file into HDFS and create external table from CSV on node1. I tried to use Java action and call fileSystem.moveFromLocalFile(...) method. Also I tried to use Shell action like /usr/bin/hadoop fs -moveFromLocal /tmp\import_folder/filename.csv /user/user_for_import/imported/filename.csv but I hadn't effect. All actions seems tried to look files on node1. The same result if I start oozie job from node2.

Question: can I set node for FTP action to load files from FTP on node1? Or can I have any other ways to pass downloaded files in HDFS instead described?


Solution

  • Oozie runs all its actions as MR jobs on nodes from a configured Map Reduce cluster. There is no way to make Oozie run some actions on a particular node.

    Basically, you should use Flume to ingest files into HDFS. Set up a Flume agent on your FTP node.