I am building an AWS Glue workflow composed of long-running jobs many of which are subject to failure. Is there any way I can re-run a specific branch in my workflow after a failure?
For example, my workflow looks something like this:
<Start Trigger> -> [Job 1] -> [Job 2] -> [Job 4]
↳ [Job 4]
Let's say [Job 1]
and [Job 4]
each take 3 hours and both complete successfully. Then [Job 2]
is triggered but fails, leaving my workflow in this state:
<Start Trigger> -> [Job 1 ✔] -> [Job 2 ✗] -> [Job 4]
↳ [Job 4 ✔]
I make a change which fixes [Job 2]
and believe it will run successfully when re-run. I'd like to be able to re-run only the [Job 2] -> [Job 4]
branch of the workflow since all other parent jobs have completed successfully.
Is there anyway this can be done in AWS Glue? I'm considering trying to build an AWS Step Functions workflow of glue jobs as Step Functions workflows seem to have this functionality.
The ability to do this is now available since August 2020.
https://docs.aws.amazon.com/glue/latest/dg/resuming-workflow.html