I've created 2 glue jobs (gluejob1, gluejob2).
I want create a dependency as gluejob2 should run only after gluejob1 is completed.
To orchestrate this, I created a step function with below definition:
{
"gluejob1": {
"Type": "Task",
"Resource": "gluejob1.Arn",
"Comment": "Glue job1.",
"Next": "gluejob2"
},
"gluejob2": {
"Type": "Task",
"Resource": "gluejob2.Arn",
"Comment": "TGlue job2.",
"Next": "Gluejob2 Finished Loading"
},
"Gluejob2 Finished Loading": {
"Type": "Pass",
"Result": "",
"End": true
}
}
When I execute this step function, state function calls it a success the moment it triggers the Gluejob1 and moves on to trigger gluejob2.
I'm wondering if there is a possibility to run gluejob2 only after gluejob1 completes.
You can invoke Glue job from StepFunction synchronously so that it will wait for job completion:
{
"StartAt": "gluejob1",
"States": {
"gluejob1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName.$": "ETLJobName1"
},
"Next": "gluejob2"
},
"gluejob2": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName.$": "ETLJobName2"
},
"Next": "Gluejob2 Finished Loading"
},
"Gluejob2 Finished Loading": {
"Type": "Pass",
"Result": "",
"End": true
}
}