Search code examples
hadoopmapreduceoozie

Can a MapReduce job in Oozie read from a file?


When creating a workflow in Oozie, I have a first java step that generates a file with the list of files I need for the next step (a map-reduce). How can I feed that map-reduce job with that file?

I know that I could tick the Capture output box of the java step and then use mapred.input.dir in the map-reduce step to use that captured output as an input. But I want to detach myself from that.

Just for the record, the content of my file looks like:

/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/18,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/19,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/20,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/21,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/22,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/23,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/24,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/25,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/26,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/27,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/28


Solution

  • Do you want to use that file as an input file or a parameter file??

    In the second case,

    • activate <capture-output/> option for the initial Action
    • output something like "param.file=/a/b/c/z.txt"
    • in the next Action, use the appropriate EL function to retrieve the file name and pass it as a <property> or <env>

      ${wf:actionData("InitialActionName")["param.file"]}

    • then use a few lines of Java to open that HDFS file and do whatever you are supposed to do with the content, before doing the actual Map or Reduce work