Search code examples
ansible

Ansible playbook fails instead of running rescue block


I have the following Ansible playbook:

- hosts: <host>
  become: true
  vars:
    dest_dir: <dir>
  roles:
    - {role: '<role>', tags: '<tag>'}

The called <role> consists of a single task file:

- name: Copy and check file.
  block:
    - name: Copy
      ansible.builtin.copy:
        src: <source file>
        dest: "{{dest_dir}}"
        mode: 0666
        owner: root
        group: root
        backup: true
      register: copy_result
    - name: Print result
      ansible.builtin.debug:
        var: copy_result
    - name: Validate
      ansible.builtin.shell: <script> 1
      when: copy_result is changed
  rescue:
    - name: Revert
      ansible.builtin.copy:
        remote_src: true
        src: "{{copy_result.backup_file}}"
        dest: "{{dest_dir}}"
      when: copy_result.backup_file is defined

The called <script> is a dummy one, simply to exit with an exit code matching the supplied argument:

exit $1

(This code is based on the Ansible FAQ answer at https://docs.ansible.com/ansible/devel/reference_appendices/faq.html#the-validate-option-is-not-enough-for-my-needs-what-do-i-do.)

If I run this when the destination file already exists but has different content to the source file, Ansible will correctly copy the source file to the destination, and I would expect the 'Validate' stanza to result in the 'rescue' block then being invoked, which should "roll back" the file copy. However, the 'rescue' block is not invoked; instead, the playbook aborts from within the 'Validate' stanza with the message:

TASK [<task> : Validate] ******************************************************
fatal: [<host>]: FAILED! => {"changed": true, "cmd": "<script> 1", "delta": "0:00:00.010525", "end": "2024-09-17 10:49:23.559180", "msg": "non-zero return code", "rc": 1, "start": "2024-09-17 10:49:23.548655", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

It then enters the debugger.

I did wonder whether the 'when' parameter of the 'rescue' block was at fault here, but commenting it out resulted in the same behaviour.

Can anyone identify what I am doing wrong here?


Solution

  • First, note that the ansible.builtin.copy module has a validate option. Using that, you could rewrite your playbook like this:

    - hosts: localhost
      gather_facts: false
      vars:
        dest_dir: /tmp
      tasks:
      - name: Copy
        ignore_errors: true
        ansible.builtin.copy:
          src: testfile
          dest: "{{dest_dir}}/testfile"
          mode: 0666
          owner: root
          group: root
          backup: true
          validate: sh -c 'exit 1' -- %s
          # in practice, this would probably be a path to a script, like:
          #
          # validate: /path/to/validation-script %s
          #
          # The `%s` gets replaced by the path to a temporary file with the
          # content to validate.
    
    

    You get effectively the same behavior with significantly less code.


    With respect to your question, I cannot reproduce the error you show. You didn't provide one in your question, so I've crafted the following minimal, reproducible example:

    - hosts: localhost
      gather_facts: false
      vars:
        dest_dir: /tmp
      tasks:
      - block:
          - name: Copy
            ansible.builtin.copy:
              src: testfile
              dest: "{{dest_dir}}"
              mode: 0666
              owner: root
              group: root
              backup: true
            register: copy_result
    
          - name: Print result
            ansible.builtin.debug:
              var: copy_result
    
          - name: Validate
            ansible.builtin.shell: "exit 1"
            when: copy_result is changed
        rescue:
          - name: Revert
            ansible.builtin.copy:
              remote_src: true
              src: "{{copy_result.backup_file}}"
              dest: "{{dest_dir}}"
            when: copy_result.backup_file is defined
    

    Running this produces (I'm using the unixy stdout callback to reduce output):

    - localhost on hosts: localhost -
    Copy...
      localhost done
    Print result...
      localhost ok: {
        "changed": false,
        "copy_result": {
            "backup_file": "/tmp/testfile.893626.2024-09-17@07:11:55~",
            "changed": true,
            "checksum": "22596363b3de40b06f981fb85d82312e8c0ed511",
            "dest": "/tmp/testfile",
            "diff": [],
            "failed": false,
            "gid": 0,
            "group": "root",
            "md5sum": "6f5902ac237024bdd0c176cb93063dc4",
            "mode": "0666",
            "owner": "root",
            "secontext": "unconfined_u:object_r:user_tmp_t:s0",
            "size": 12,
            "src": "/home/lars/.ansible/tmp/ansible-tmp-1726571514.9080791-893587-119392786896945/.source",
            "state": "file",
            "uid": 0
        }
    }
    Validate...
      localhost failed | msg: non-zero return code
    Revert...
      localhost ok
    
    - Play recap -
      localhost                  : ok=3    changed=1    unreachable=0    failed=0    rescued=1    ignored=0
    

    We can clearly see that the "Revert" task is running. However, note that the "Revert" task does not do what you want. In the above example, our target file is /tmp/testfile, which means the backup file is named something like /tmp/testfile.893626.2024-09-17@07:11:55~. This makes the "Revert" task the equivalent of:

    - name: Revert
      ansible.builtin.copy:
        remote_src: true
        src: "/tmp/testfile.893626.2024-09-17@07:11:55~"
        dest: "/tmp"
      when: copy_result.backup_file is defined
    

    And that's a no-op; you're copying the backup file onto itself. If you wanted to replace the target file, you would need to provide an explicit filename in the dest parameter rather than a directory; that would make the playbook look like this:

    - hosts: localhost
      gather_facts: false
      vars:
        dest_dir: /tmp
      tasks:
      - block:
          - name: Copy
            ansible.builtin.copy:
              src: testfile
              dest: "{{dest_dir}}/testfile"
              mode: 0666
              owner: root
              group: root
              backup: true
            register: copy_result
    
          - name: Print result
            ansible.builtin.debug:
              var: copy_result
    
          - name: Validate
            ansible.builtin.shell: "exit 1"
            when: copy_result is changed
        rescue:
          - name: Revert
            ansible.builtin.copy:
              remote_src: true
              src: "{{copy_result.backup_file}}"
              dest: "{{dest_dir}}/testfile"
            when: copy_result.backup_file is defined
    

    Running that successfully reverts the file when the "Validate" task fails.