Search code examples
pythonhadoopooziecombiners

How to mention a Combiner in Oozie while using streaming jar


I have a streaming job that I am calling through Oozie. I am able to run this successfully with a mapper and reducer. But what I am failing to understand is, how do I pass the combiner. All my mapper, reducer and combiner are written in Python. Will this work?

<map-reduce>
 <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <prepare>
        <delete path="${HADOOP_LIB}/OutPath"/>
    </prepare>
    <streaming>
        <mapper>python mapper.py</mapper>
        <combiner>python combiner.py</combiner>
        <reducer>python reducer.py</reducer>

    </streaming>
    <configuration>
        <property>
            <name>mapred.input.dir</name>
            <value>${HADOOP_LIB}/input</value>
        </property>
        <property>
            <name>mapred.output.dir</name>
            <value>${HADOOP_LIB}/OutPath</value>
        </property>
    </configuration>
    <file>mapper.py</file>
    <file>combiner.py</file>
    <file>reducer.py</file>
</map-reduce>

I could not find anywhere the use of tags. Alternatively can I just use the streaming jar command with -combiner option in a shell script and call that job from Oozie.


Solution

  • No, there is currently no combiner option for the streaming component of the Oozie MapReduce action. You would need to invoke the MR streaming jar directly either through an Oozie Shell or Java action so that you can pass the combiner property.