Search code examples
python-3.xbashsshansibleremote-server

ansible.builtin.shell not working with disown and weird output redirection


I'm having a hard time understanding Ansible's overall remote behaviour.

I'm referring to all ansible.builtin.command and ansible.builting.shell modules.

It seems not to collet any output for scripts put in the background and disowned.

It seems to fail properly disown jobs (although nohup seems to work just fine, beside redirections -- so still there is a problem of loosing the whole background job's output).

I want to install a python templated script on multiple remote machines, run it and detach from Ansible's connection so that it runs in the background (as it's a blocking call listening on TCP port). The python script is a couple of imports, print statements and then call to a server.start(), which is a blocking listening on a TCP port.

I wrote this task:

 - name: start services  # not working clearly 
    ansible.builtin.shell:
      executable: /bin/bash
      cmd: | 
        python3 <<-'EOF' &
        {{ lookup("ansible.builtin.template", "../scripts/server_start.tmpl", template_vars={"service": item}) }}
        EOF
        disown -h %1
    loop: "{{hostvars[inventory_hostname].services}}"

which does not work at all, as:

  • output (any print() statements) from my python script is never captured by ansible -- although I thought that after calling disown only stdin is being detached,
  • the script gets killed alongside Ansible process -- it does not preserve on remote machine

Can anyone give some logical explanation for this? I have tested it locally and it works as expected. I have even tested the script locally by running /bin/bash -c 'python3 <<-'EOF' ...', as that's how it's executed by ansible, I suppose. I have even tested it via SSH and still works. What is it with Ansible that it looses the output and does not correctly disown the job?

As a side note, Ansible captures the output correctly if I omit the blocking server.start() call. I have replaced server.start() call with a long sleep() and it does not work neither.


Solution

  • For anyone running into similar troubles with processes spawned remotely using Ansible: https://github.com/ansible/ansible/issues/33410. It seems that there is a re-occuring problem between Ansible's versions which results in killing all child processes upon terminating remote connection. This seems to answer most of my doubts.

    I have changed the task to use nohup and python script to use logger with a file handler (as I mentioned in the original post, redirection of script's stdout did not work).