Search code examples
ooziedistcp

oozie distcp job execution


I have a oozie work-flow which is performing a distcp operation. Workflow file is as below :

<workflow-app xmlns="uri:oozie:workflow:0.3" name="distcp-wf">
<start to="distcp-node"/>
<action name="distcp-node">
    <distcp xmlns="uri:oozie:distcp-action:0.1">
        <job-tracker>${jobtracker}</job-tracker>
        <name-node>${namenode}</name-node>
        <prepare>
            <delete path="${namenode}/tmp/mohit/"/>
        </prepare>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queue_name}</value>
            </property>
        </configuration>
        <arg>-m 1</arg>
        <arg>${number_of_mapper}</arg>
        <arg>-skipcrccheck</arg>
        <arg>${namenode}/tmp/mohit/data.txt</arg>
        </distcp>
    <ok to="end"/>
    <error to="fail"/>
</action>
<kill name="fail">
    <message>DistCP failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

I want to set the number of mapper using -m using distcp. How can i do that I have tried with

<arg>-m 1</arg>

and

<arg>1<arg>

But did not worked for me. The error that I am getting is as below :

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.DistcpMain], main() threw exception, Returned value from distcp is non-zero (-1)
      java.lang.RuntimeException: Returned value from distcp is non-zero (-1)

Solution

  • Args are for input/output as described in the documentation

    The first arg indicates the input and the second arg indicates the output

    For changing the number of producers/reducers use the configuration for example :

     <configuration>
                <property>
                    <name>mapred.reduce.tasks</name>
                    <value>${firstJobReducers}</value>
                </property>
    </configuration>