Rundeck Monitoring via ITRS Geneos

Have setup various jobs in Rundeck. Want to monitor the same using ITRS Geneos. We tried writing queries using execution and scheduled_execution tables but the scheduled_execution has lot of variance in scheduling e.g. it can have 1-5 also and MON-FRI also for the same scheduling making it difficult.

What we want to monitor: 1. whether there are any failed executions 2. whether any job is running more than its average execution time 3. whether the job is running more than its scheduled time i.e. if job is scheduled to run every 5 mins - we should get alerted if its running more than 5 mins.

Please note - I understand we can do this using email alerts directly via rundeck but want to use ITRS Geneos so looking for some query or API reference which might be helpful.

Please suggest if any solution is available. Thanks.

Solution

There are a few alternative to the default email notification, in order to obtain the information you need.

You can set Webhook Notification in your job definition, that will POST the data to a webhook URL.

You can query Rundeck's API to collect the information on a specific or all projects basis or querying specific executions to narrow down the query results to specific jobs. Listing executions will provide the following information that you can use for the monitoring purposes you require:

[status] for a failed execution [date-started] and [date-ended] for the execution. Furthermore, if known, the average duration of the associated Job will be indicated (in milliseconds) as averageDuration.

Alternatively, you can get details about an execution state, that will provide an overall information about an execution, such as start and end time and the current state, with more detailed information down to node/workflow step level as a bonus.

The above information collected via API can also be gathered via RD CLI.

Furthermore, a notification plugins can also be developed created (either in java or groovy) to be used in a job definition, that can be triggered on the following cases:

onstart - the Job started onsuccess - the Job completed without error onfailure - the Job failed or was aborted onavgduration - The Execution exceed the average duration of the Job onretryablefailure - the Job failed but will be retried

which is a great alternative to provide the information to ITRS or other monitoring tools, instead of directly querying the API/RD CLI for the information.

Hope this helps!