Search code examples
ansibleansible-inventory

Ansible ssh connection timeout after reboot performed


I am new to Ansible and trying an operation after checking system requires reboot or not and this will be handle by the ansible itself, I am able to identify whether system is required to rebooted or not, after that system takes some time to become functional(in started state), by default it perform 5 ssh connections attempts to verify system is up. Some points i would like to add: there is default setting of 5 attempts for ssh in ansible.config is it possible to overwrite that value inside the playbook.

Is there a way to increase delay between retries if ssh attempts remains the same(e.g: 5) I tried achieving the same but failed to do so.

- name: Check for require reboot
  hosts: "{{ target_host }}"
  remote_user: root
  gather_facts: False
  tasks:
    - name: Performing reboot check now after patch integration
      command: needs-restarting -r
      register: needsrestarting
      changed_when:
        - needsrestarting.rc != 0
      failed_when:
        - needsrestarting.rc != 1
        - needsrestarting.rc != 0

    - name: Reboot the server
      tags: reboot
      command: reboot
      async: 1
      poll: 0
#      when:
#        - needsrestarting.rc == 1

    - name: Adding delay for system to come up
      wait_for:
       timeout: 300

    - name: Wait for the nodes to wake up for login with fixed timer
      wait_for:
        host={{ ansible_ssh_host }}
        port={{ ansible_ssh_port }}
        timeout=300
        state=started
        delegate_to= "{{ inventory_hostname }}"
        ignore_errors= yes
      delay: 60
      retries: 20

logs:

 UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017

Solution

  • You need to delegate the wait_for task to the Ansible controller:

    - name: Adding delay for system to come up
      wait_for:
        timeout: 300
      delegate_to: localhost
    

    Otherwise, the wait_for task is to be executed at the target, which is unreachable since it is rebooting. This is a general use module, that serves many purposes. Sometimes it makes sense to use remotely, sometimes not.

    On the other way around, one may prefer to execute the playbook locally, eventually delegating tasks to targets. This decision may be driven by how many tasks are to be executed locally and how many are not:

    - name: Check for require reboot
      hosts: "{{ target_host }}"
      remote_user: root
      gather_facts: False
      connection: local
      tasks:
        ...
    

    Please note that local execution has nothing to do with target hosts, inventory, etc., as they get iterated anyhow. You may address a whole list of targets just by local executions (like vmWare modules, for instance, that get executed on the controller and create disks on inventory targets).

    From the Ansible docs on the wait_for module, you may actually improve your wait condition, for specific ssh connectivity (default "timeout" value of 300s):

    # Do not assume the inventory_hostname is resolvable and delay 10 seconds at start
    - name: Wait 300 seconds for port 22 to become open and contain "OpenSSH"
      wait_for:
        port: 22
        host: '{{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}'
        search_regex: OpenSSH
        delay: 10
      connection: local
    

    Or use the the specific wait_for_connection module for that purpose:

    # Wake desktops, wait for them to become ready and continue playbook
    - hosts: all
      gather_facts: no
      tasks:
      - name: Send magic Wake-On-Lan packet to turn on individual systems
        wakeonlan:
          mac: '{{ mac }}'
          broadcast: 192.168.0.255
        delegate_to: localhost
    
      - name: Wait for system to become reachable
        wait_for_connection:
    
      - name: Gather facts for first time
        setup:
      ...
    

    Note the first task, executed locally, since targets are not yet available, but are the destination for the magic packets.

    After that, there's no need for the wait_for_connection task to be delegated, because it is built to operate this way.