Search code examples
oozieoozie-workflow

Oozie let other forked actions continue in case one fails but terminate after the join


I have a work-flow that I fork into 3 actions.

<start to="PARALLEL_PROCESS_FORK"/>
<fork name="MY_FORK">
<path start="START_PARALLEL_PATH_1"/>
<path start="START_PARALLEL_PATH_2"/>
<path start="START_PARALLEL_PATH_3"/>
</fork>

The three paths start a series of actions, each of which can fail. What I find trivial to do is to create the following DAG. After the joinother actions follow. The proble with the DAG below is that, if I reach a kill node, for example at the top path, all other path will be also killed before reaching the join.

Current execution

However, this is not the desired flow. What I need is that, if an action at a parallel path fails, I need to kill only that path of execution but the other paths should continue until the join. For example, if action A2 fails, action A3 will be skipped, but C1, C2, C3 will be executed. The decision node after the join will detect that an error happened and will terminate.

Desired solution

Do you know how I could achieve that?


Solution

  • Option 1: Using wf:actionExternalStatus

    In this case, the idea is simple: use the external status of a node to determine what to do next.

    enter image description here

    The status of a node can be either RUNNING, KILLED, FAILED, SUCCEEDED, empty if the node has been skipped, or "FAILED/KILLED". So we need to check for the KILLED or FAILED statuses. Given the limitations of the default Oozie EL functions, we can use the following construct:

    <decision name="check-if-action-failed">
      <switch>
        <case to="kill_A">
          ${replaceAll(wf:actionExternalStatus('A1'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'} or
          ${replaceAll(wf:actionExternalStatus('A3'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'}
        </case>
        <case to="kill_C">
          ${replaceAll(wf:actionExternalStatus('C1'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'} or
          ${replaceAll(wf:actionExternalStatus('C2'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'}
        </case>
        <default to="next-action"/>
      </switch>
    </decision>
    

    Option 2: Using a shell-script

    The workaround that I show below is to use a Shell script in the nodes that are KO. These are the nodes where we want the parallel branch to step execution, but let the other parallel branches reach the join. enter image description here

    So in the image above we have KO_A and KO_C. This will invoke a shell script echo-ko.shthat will simply do:

    echo "STATUS=KO"
    

    In Oozie terms these nodes will look like:

    KO_A

    <action name="KO_A" retry-max="3" retry-interval="2">
        <shell xmlns="uri:oozie:shell-action:0.3">
            <exec>echo-ko.sh</exec>
            <file>\path\to\script\echo-ko.sh</file>
            <capture-output/>
        </shell>
        <ok to="my-join"/>
        <error to="some-fail-node"/>
    </action>
    

    Decision after the join

    <decision name="check-branch-ko">
        <switch>
            <case to="kill_A">
                ${wf:actionData('KO_A')['STATUS'] eq 'KO'}
            </case>
            <!-- Other cases --> 
            <default to="..."/>
        </switch>
    </decision>