Search code examples
hadoopoozie

Using Oozie to combine output file parts


Is it possible to use Oozie to concatenate the output of a MapReduce job into a single file? Lets say I have the output ...

part-r-00000
part-r-00001
part-r-00002

and I just want...

output.csv

I know I can pull them down as a single file with hadoop fs -getmerge, but I'm curious if it's possible with a workflow application and HDFS.


Solution

  • Two simple options i can think of:

    1. Amend the job that produced this output to use a single reducer
    2. Run a map-reduce action with identity mapper, identity reducer and single reducer