I have a Rails application that uses rufus-scheduler to schedule some of its tasks. The scheduler get initialised in the after_fork of a unicorn script initiating the Rails application (Reference: https://gist.github.com/jkraemer/3851917).
Recently, I encountered a scenario where the unicorn scheduler thread got killed while the unicorn process hosting it was still active. This lead to the scheduler being down for a considerable time without notice.
I would like to setup an alert and monitoring system to handle such scenarios that should be capable of: 1. Throwing an alert whenever the scheduler thread is killed/aborted. 2. Restart the scheduler if the Rails process is running.
What would be a suitable way to know the running status of the scheduler, through perhaps a cron task and handle the auto-restart ?
If you bring in a cron task to watch rufus-scheduler, why don't you simply drop rufus-scheduler and use Whenever for all the scheduling? You'll thus have a 100% crond solution.
Another alternative, use Foreman to manage a) your Rails application b) a separate process hosting your rufus-scheduler schedules (it could be a "schedule only" version of your Rails application) (here is a nice example in a blog post). There is also God.
If you look at Clockwork, a gem that was inspired by rufus-scheduler, they detail such one application / multiple processes scenarii.
Then you could look at the various tools out there to watch your logs and alert you in case of error.
Update:
Maybe this gist could help: https://gist.github.com/jmettraux/5f716c8568f6ab7bb016a4bc11528bc2