Search code examples
ansibleshutdownansible-2.x

Ansible task for checking that a host is really offline after shutdown


I am using the following Ansible playbook to shut down a list of remote Ubuntu hosts all at once:

- hosts: my_hosts
  become: yes
  remote_user: my_user
  tasks:

    - name: Confirm shutdown
      pause:
        prompt: >-
          Do you really want to shutdown machine(s) "{{play_hosts}}"? Press
          Enter to continue or Ctrl+C, then A, then Enter to abort ...

    - name: Cancel existing shutdown calls
      command: /sbin/shutdown -c
      ignore_errors: yes

    - name: Shutdown machine
      command: /sbin/shutdown -h now

Two questions on this:

  1. Is there any module available which can handle the shutdown in a more elegant way than having to run two custom commands?
  2. Is there any way to check that the machines are really down? Or is it an anti-pattern to check this from the same playbook?

I tried something with the net_ping module but I am not sure if this is its real purpose:

- name: Check that machine is down
      become: no
      net_ping:
        dest: "{{ ansible_host }}"
        count: 5
        state: absent

This, however, fails with

FAILED! => {"changed": false, "msg": "invalid connection specified, expected connection=local, got ssh"}

Solution

  • There is no shutdown module. You can use single fire-and-forget call:

    - name: Shutdown server
      become: yes
      shell: sleep 2 && /sbin/shutdown -c && /sbin/shutdown -h now
      async: 1
      poll: 0
    

    As for net_ping, it is for network appliances such as switches and routers. If you rely on ICMP messages to test shutdown process, you can use something like this:

    - name: Store actual host to be used with local_action
      set_fact:
        original_host: "{{ ansible_host }}"
    - name: Wait for ping loss
      local_action: shell ping -q -c 1 -W 1 {{ original_host }}
      register: res
      retries: 5
      until: ('100.0% packet loss' in res.stdout)
      failed_when: ('100.0% packet loss' not in res.stdout)
      changed_when: no
    

    This will wait for 100% packet loss or fail after 5 retries.
    Here you want to use local_action because otherwise commands are executed on remote host (which is supposed to be down).
    And you want to use trick to store ansible_host into temp fact, because ansible_host is replaced with 127.0.0.1 when delegated to local host.