I have a workflow that consist of doing some heavy work outside of Google AppEngine by adding tasks to a Redis queue. The outside servers that treats the Redis queue send a POST request back to GAE once the work is done.
The issue is that sometime, the Redis queue fails, or at last the POST request isn't made. This results in a "waiting" state on GAE that never changes.
In order to fix that issue, I plan to implement a "healthcheck" system that will automatically close the state to "invalid" after some time that have elapsed, but I was wondering which one was the best way to do it, in term of resources & pricing.
Option 1 : When I submit a task to the Redis queue, I would also create a new GAE task, called "Healthcheck", that would be run in 5 minutes, and if the current task is not done, switch it to "invalid" and close it.
Option 2 : When submitting the task to the Redis queue, I hang th process using a while True
loop and refresh the current state up until it switched to done. I would also set up a Deadline exception
watcher that would update the status to "invalid" when the Deadline is called by GAE (in general, 10 minutes).
PROs/CONs :
Is there a third way to do it I didn't think about?
Tasks are free to use (except storage space that is very cheap). I don't see why number of tasks is a con.
The 2nd option is not an option in my opinion as is expensive and process can die at any time.
The 3d option would be to have a cron job to query records that are in "running" state for more then defined deadline. That would require composite index on 2 fields (status, dt_created) - will be more expensive comparing to option #1.
Go with tasks.