Search code examples
sshansibleconnectionconnectivity

Ansible ssh connection drops [fails] for one of the task while works for other tasks


Here is my playbook having three tasks:

- name: Play 2- Configure Source nodes
  hosts: all_hosts

  vars:
    ansible_ssh_extra_args: -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50
    ansible_ssh_private_key_file: /app/misc_automation/ssh_keys/id_rsa

  gather_facts: false
  tasks:

   - name: Get Process Dump for tomcat on non-Solaris
     ignore_errors: yes
     block:
       - raw: ps -ef | grep java | grep -i tomcat | grep -v grep
         ignore_errors: yes
         register: tomjavadump

       - raw: ps -ef | grep java | grep -i tomcat | grep -v grep | wc -l
         ignore_errors: yes
         register: tomjavadumpcount

       - raw: "echo <tr><td>{{ inventory_hostname }}</td><</tr>"
         delegate_to: localhost
         when: tomjavadump.rc == 0 and patchthistomcat is undefined

I ran the above playbook in debug mode and the exact same ssh connection works for the two tasks but fails for the first task as seen in the debug output below:

TASK [raw] *************************************************************************************************************************************************************
task path: /app/Ansible/playbook/check.yml:1260
<10.0.0.211> ESTABLISH SSH CONNECTION FOR USER: root
<10.0.0.211> SSH: EXEC ssh -o 'IdentityFile="/app/misc_automation/ssh_keys/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50 10.0.0.211 'ps -ef | grep java | grep -i tomcat | grep -v grep'
<10.0.0.211> (1, '', '')
<10.0.0.211> Failed to connect to the host via ssh:
fatal: [10.0.0.211]: FAILED! => {
    "changed": true,
    "msg": "non-zero return code",
    "rc": 1,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []
}
...ignoring

TASK [raw] *************************************************************************************************************************************************************
task path: /app/Ansible/playbook/check.yml:1267
<10.0.0.211> ESTABLISH SSH CONNECTION FOR USER: root
<10.0.0.211> SSH: EXEC ssh -o 'IdentityFile="/app/misc_automation/ssh_keys/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50 10.0.0.211 'ps -ef | grep java | grep -i tomcat | grep -v grep | wc -l'
<10.0.0.211> (0, '0\n', '')
changed: [10.0.0.211] => {
    "changed": true,
    "rc": 0,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "0\n",
    "stdout_lines": [
        "0"
    ]
}

TASK [raw] *************************************************************************************************************************************************************
task path: /app/Ansible/playbook/check.yml:1270
skipping: [10.0.0.211] => {
    "changed": false,
    "skipped": true,

I have no idea but the same code works fine if i change the target server from 10.0.0.211 to something else.

Why does the exact same ssh connection work for other tasks and fail for the first task ?

How can I fix this issue ?

Here is the maximum debug for ssh failed and passed connections for both the failing and passing tasks https://filebin.net/8v5xy28edtaz0bhh/ansible_ssh_issue.txt?t=o4l9o4d1.


Solution

  • The problem here is raw module, which is running your shell commands inside the remote node which presumably does not have python installed over it.

    This means ansible is running your commands as below:

    ssh user@ip "commands"
    

    In your case:

    <10.0.0.211> SSH: EXEC ssh -o 'IdentityFile="/app/misc_automation/ssh_keys/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50 10.0.0.211 'ps -ef | grep java | grep -i tomcat | grep -v grep'
    

    And throw the error if ssh fail OR rc of commands is non-zero.

    To preserve the return code of your command and also prevent rc to turn non-zero, you should use

    -raw: ps -ef | grep java | grep -i tomcat | grep -v grep; awk -vrc=$? 'BEGIN{print "rc="rc}'
    

    By doing this you are capturing the rc of last grep command and printing it via awk command. Here awk would always return zero-rc.

    Once you have stdout captured, you can search for rc=0 or rc=1 based on your requirement in the var.stdout.

    If you do not care of rc then just add the suffix of ||true to your command.