When creating a workflow in Oozie, I have a first java step that generates a file with the list of files I need for the next step (a map-reduce). How can I feed that map-reduce job with that file?
I know that I could tick the Capture output box of the java step and then use mapred.input.dir in the map-reduce step to use that captured output as an input. But I want to detach myself from that.
Just for the record, the content of my file looks like:
/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/18,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/19,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/20,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/21,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/22,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/23,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/24,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/25,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/26,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/27,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/28
Do you want to use that file as an input file or a parameter file??
In the second case,
in the next Action, use the appropriate EL function to retrieve the file name and pass it as a <property> or <env>
${wf:actionData("InitialActionName")["param.file"]}
then use a few lines of Java to open that HDFS file and do whatever you are supposed to do with the content, before doing the actual Map or Reduce work