Search code examples
hadoopoozieoozie-coordinator

Oozie: rerun all non-SUCCEEDED workflows in coordinator


I scheduled a coordinator which initiated many individual workflows. This was a backfill coordinator, with both startdate and enddate in the past.

A small percentage of these jobs failed due to temporary issues with the input datasets, and now I need to re-run those workflows (without re-running the successful workflows). These unsuccessful workflows have a variety of statuses: KILLED, FAILED, and SUSPENDED.

What is the best way to do this?


Solution

  • I ended up writing a bash script to do this. I won't copy the whole script here, but this was the general outline:

    First, parse the output of oozie job -info to get a list of actions with a given status for a given coordinator:

    actions=$(oozie job -info $oozie_coord -filter status=$status -len 1000 |
              grep "\-C@" |
              awk '{print $1}' |
              sed -n "s/^.*@\([0-9]*\).*$/\1/p")
    

    Then loop over these actions and issue rerun commands:

    while read -r action; do
      oozie job -rerun $oozie_coord -action $action