Deployed Rundeck (rundeck/rundeck:4.2.0) importing and discovering my inventory using Ansible Resource Model Source. Having 300 nodes, out of which statistically ~150 are accessible/online, the rest is offline (IOT devices). All working fine.
My challenge is when creating jobs i can assign only those nodes which are online, while i wanted to assign ALL nodes (including those offline) and keep retrying the job for the failed ones only. Only this way i could track the completeness of my deployment. Ideally i would love rundeck to be intelligent enough to automatically deploy the job as soon as my node goes back online.
Any ideas/hints how to achieve that ?
Thanks,
The easiest way is to use the health checks feature (only available on PagerDuty Process Automation On-Prem, formerly "Rundeck Enterprise"), in that way you can use a node filter only for "healthy" (up) nodes.
Using this approach (e.g: configuring a command health check against all nodes) you can dispatch your jobs only for "up" nodes (from a global set of nodes), this is possible using the .*
as node filter and !healthcheck:status: HEALTHY
as exclude node filter. If any "offline" node "turns on", the filter/exclude filter should work automatically.
On Ansible/Rundeck integration it works using the following environment variable: ANSIBLE_HOST_KEY_CHECKING=False
or using host_key_checking=false
on the ansible.cfg
file (at [defaults]
section).
In that way, you can see all ansible hosts in your Rundeck nodes, and your commands/jobs should be dispatched only for online nodes, if any "offline" node changes their status, the filter should work.