I am trying to create a system where I need to implement the exponential backoff algorithm. I have a controller and a worker. The worker is the one that sends the request to a particular URL and waits for the response. The controller just assigns the task to the workers that are free. Incase the request from the worker fails, the failure status of the request is entered into a database.
To implement the exponential backoff algorithm, should the controller be running a separate thread to identify failed requests from the Database. Or is there something that can be done at the worker level without holding up the worker for the duration of the retries?
In many cases, retries with backoff algorithms are used inside workers. Basically, if a controller calls a worker, the controller just wants to get the job done and retries help to mitigate various temporary issues, like tiny network issues.
The typical logic is (when a worker is called to run a task):
At the end of the day, it will be several calls to a failed resource with delay being increased after each failure.
The delay constant (some_delay in the above text) is picked based on overall system architecture. How long the controller can wait? If the controller itself timeouts at some point (or controllers customers timeout), then the sum of all intervals must be less than that timeout - otherwise there is no point to retry jobs as customers won't be able to get results anyway.
One more topic to consider is what is the thread management approach in your application. While a worker waits for the next retry, the thread will be busy sleeping, that may or may not be a problem.
And the last extra point, if you already have a backoff retry, it may make sense to consider adding a circuit breaker pattern; so if a remote resource is down, the system won't waste time retrying all the time (and keeping threads busy with nothing).