Search code examples
md5ansibleintegritymd5-file

Is there an elegant way to check file integrity with md5 in ansible using md5 files fetched from server?


I have several files on a server that I need to download from an ansible playbook, but because the connection has good chances of interruption I would like to check their integrity after download.

I'm considering two approaches:

  1. Store the md5 of those files in ansible as vars
  2. Store the md5 of those files on the server as files with the extension .md5. Such a pair would look like: file.extension and file.extension.md5.

The first approach introduces overhead in maintaining the md5s in ansible. So everytime someone adds a new file, he needs to make sure he adds the md5 in the right place.

But as an advantage, there is a solution for this, using the built in check from get_url action in conjunction with checksum=md5. E.g.:

action: get_url: url=http://example.com/path/file.conf dest=/etc/foo.conf checksum=md5:66dffb5228a211e61d6d7ef4a86f5758

The second approach is more elegant and the narrows the responsibility. When someone adds a new file on the server, he will make sure to add the .md5 as well and won't even need to use the ansible playbooks.

Is there a way to use the checksum approach to match the md5 from a file?


Solution

  • If you wish to go with your method of storing the checksum in files on the server, you can definitely use the get_url checksum arg to validate it.

    Download the .md5 file and read it into a var:

    - set_fact:
        md5_value: "{{ lookup('file', '/etc/myfile.md5') }}"
    

    And then when you download the file, pass the contents of md5_value to get_url:

    - get_url:
        url: http://example.com
        dest: /my/dest/file
        checksum: "md5:{{ md5_value }}"
        force: true
    

    Note that it is vital to specify a path to a file in dest; if you set this to a directory (and have a filename in url), the behavior changes significantly.

    Note also that you probably need the force: true. This will cause a new file to download every time you run it. The checksum is only triggered when files are downloaded. If the file already exists on your host it won't bother to validate the sum of the existing file, which might not be desirable.

    To avoid the download every time you could stat to see if the file already exists, see what its sum is, and set the force param conditionally.

    - stat:
        path: /my/dest/file
      register: existing_file
    
    - set_fact:
        force_new_download: "{{ existing_file.stat.md5 != md5_value }}"
      when: existing_file.stat.exists
    
    - get_url:
        url: http://example.com
        dest: /my/dest/file
        checksum: "md5:{{ md5_value }}"
        force:  "{{ force_new_download | default ('false') }}"
    

    Also, if you are pulling the sums/artifacts from some sort of web server you can actually get the value of the sum right from the url without having to actually download the file to the host. Here is an example using a Nexus server that would host the artifacts and their sums:

    - set_fact:
        md5_value: "{{ item }}"
      with_url: http://my_nexus_server.com:8081/nexus/service/local/artifact/maven/content?g=log4j&a=log4j&v=1.2.9&r=central&e=jar.md5
    

    This could be used in place of using get_url to download the md5 file and then using lookup to read from it.