Search code examples
amazon-web-servicesamazon-swf

AWS SWF cancelling child workflows automatically


I have AWS SWF workflow which creates many child workflows at runtime based on the number of input files. For x number of input files, it will create x number of child workflows. It works fine when number of input files is around 400 and successfully creates and executes 400 child workflows.

The issue is - when my input has around 500 files or more, it starts that many child workflows successfully but then automatically cancels some of them. I have tried different configurations but nothing worked.

I think AWS limit for number of child workflows is 1000, so that should not be issue.

Current child workflow config: Execution Start To Close Timeout: 2 hours 1 minute Task Start To Close Timeout: 1 minute 30 seconds

Main workflow config: Execution Start To Close Timeout: 9 hours Task Start To Close Timeout: 1 minute 30 seconds


Solution

  • My guess is that some exception is thrown in the workflow code which by default cancels workflows in the same cancellation scope. Read the TryCatchFinally documentation for more info about the cancellation semantic.

    In general I wouldn't recommend that many child workflows in SWF, you can always do it hierarchically. Like 30 children, each of them 30 children give 900 workflows.