Search code examples
file-iodatasetbundleoozie-coordinator

how to run multiple coordinators in oozie bundle


I'm fresher in oozie bundle. I want to run multiple coordinators one after another in bundle job.My requirement is after completion of one coordinator job _SUCCESS file will be generated, then by using that _SUCCESS file second coordinator should be triggered. I don't know how to do that.For that i used data dependency technique which will keep track for generated output files of previous coordinator. I'm sharing some code which i tried.

Lets say there are 2 coordinator jobs:A and B.and i want to trigger only A coordinator.and if _SUCCESS file for Coordinator A generated then only Coordinator B should get start.

A - coordinator.xml

<workflow>

    <app-path>${aDir}/aWorkflow</app-path>      

</workflow>

this will call respective workflow.and _SUCCESS file is generated at ${aDir}/aWorkflow/final_data/${date}/aDim location so i included this location in B coordinator:

  <dataset name="input1" frequency="${freq}" initial-instance="${START_TIME1}" timezone="UTC">

     <uri-template>${aDir}/aWorkflow/final_data/${date}/aDim</uri-template>

  </dataset>

  <done-flag>_SUCCESS</done-flag>   

  <data-in name="coordInput1" dataset="input1">

      <instance>${START_TIME1}</instance>

  </data-in>

  <workflow>

     <app-path>${bDir}/bWorkflow</app-path>

  </workflow>

but when i run it first coordinator gets KILLED itself, but if i run individually they are running successfully.i'm not getting why these are all getting KILLED. help to sort out


Solution

  • I find out easy way to do that. I'm sharing solution.For coordinator B coordinator.xml I'm sharing. 1)For Data-set instance should be start time of second one but it should not be time instance of first coordinator.otherwise that particular coordinator will get KILLED. 2)If you want to run multiple coordinators one after another then you can also include controls in coordinator.xml. e.g. concurrency, timeout or throttle. Detailed information about these controls you can find out in "apache oozie" book's 6th chapter. 3)in "" i included latest(0) it will take latest generated folder in mentioned output path. 4)for "input-events" it is mandatory to include it's name as a input to ${coord:dataIn('coordInput1')}.otherwise oozie will not consider dataset.

    30 1 ${aimDir}/aDimWorkflow/final_data/${date}/aDim _SUCCESS ${coord:latest(0)}
    ${bDir}/bWorkflow input_files ${coord:dataIn('coordInput1')}