I can't seem to find good explanation of what would happen if several workers listen to the same AWS step function activity ARN? I mainly interested in the mechanics as I am exploring how to introduce fault tolerance to the processes that listen to step function activities, meaning if one worker fails, how another worker can pickup the slack from an activity task?
If there is a defined pattern - please share :-)
During another conversation I think I've got the answer to this question:
In a case there are two workers in two different AZs that listen to the same activity ARN and one of the AZ goes down before worker in that AZ is able to get the taskToken, other worker can pick it up and start the work on the task.
In another case, if worker in the AZ1 picked the work and AZ1 goes down - the step will eventually timeout, but retry of the step could force worker in AZ2 to pick the work.
In a sense step function activity is a task queue.