Search code examples
ansibleansible-2.x

"pause" for every host


Before rolling updates I want to set downtime for every host in our monitoring tool. I created a custom module for this. There might be problems when setting downtime which can not be fixed on our end. In that case I want to give the user the choice to decide if the deployment should be aborted or continued without setting downtime.

So let's say I call my module like this:

- downtime:
    duration: 5m
    comment: whatever
  ignore_errors: true
  register: downtime

So I'm ignoring the errors to be able to proceed. Otherwise hosts for which setting the downtime failed would not be processed any further.

In the next step I would like the user to manually confirm if he wants to proceed for every host that has no downtime set.

- name: Request user confirmation to proceed in case downtime could not be set
  pause:
    prompt: 'Downtime could not be set for all hosts. Do you want to proceed? Press return to continue. Press Ctrl+c and then "a" to abort'
  when: "{{ downtime | failed }}"

Unfortunately the pause module (actually it's an action plugin) would only pause for the first host that was processed. So if the first host failed, it will pause, if the first host passed and all other hosts failed it will simply continue with all hosts.

This seems to be the intended behavior. From the docs:

The pause module integrates into async/parallelized playbooks without any special considerations (see also: Rolling Updates). When using pauses with the serial playbook parameter (as in rolling updates) you are only prompted once for the current group of hosts.

So no matter how, even if I would use serial: 1 (which would not be possible in this case) pause would only stop for the first host.

Right now I simply pause without a condition and let the user decide if he wants to continue or not, no matter if the downtime task failed or not. But since the fails should be very rare this is a manual step I'd like to avoid.

Can anyone see a solution how to either:

  • pause for every host (that failed)
  • pause once, in case any host failed

Solution

  • This bug report gave me the inspiration to work with a loop. The following solution asks for confirmation of every failed host separately:

    - downtime:
        duration: 5m
        comment: whatever
      ignore_errors: true
      register: downtime
    
    - name: Saving downtime state
      set_fact:
        downtime_failed: "{{ downtime | failed }}"
    
    - name: Request user confirmation to proceed in case downtime could not be set
      pause:
        prompt: 'Downtime could not be set for {{ item }}. Do you want to proceed? Press return to continue. Press Ctrl+c and then "a" to abort'
      when: "{{ hostvars[item]['downtime_failed'] }}"
      with_items: "{{ play_hosts }}"
    

    Since the pause module is only run for the first host listed in the inventory, we loop over all available hosts (play_hosts). To access the state from all other hosts we first need to store the result as a fact (set_fact) and later we can access it via hostvars, which holds all facts of all hosts of the current play.