Search code examples
dateansiblefindpattern-matchingsequence

Finding all matching folders/files matching a yyyymmdd_hhmmss format between two dates+times in Ansible


I am trying to quickly find all folders named in a yyyymmdd_hhmmss format between two dates and times. These dates and times are variables set on user input.

E.g., all folders between

20221231_120000
20230101_235920

All dates/times looked for being valid is not a requirement for me.

Note that the 'age' of the folders does not match their names.


I have looked at regex but it seems like a complex solution for variable dates/times.

I have looked at Ansible find module patterns but they are incredibly slow, because it runs the find command for every sequential number. Taking about 1 second per checked number.

For example:

  - name: Find folders matching dates and times
    vars:
      startdate: "20230209"
      enddate: "20230209"
      starttime: "120000"
      endtime: "130000"
    ansible.builtin.find:
      paths:
        - "/folderstocheck/
      file_type: directory
      patterns: "{{ item[0:8] }}_{{item[8:-1]}}"
    with_sequence: start={{ startdate + starttime }} end={{ enddate + endtime }}
    register: found_files

Takes approximately 167 minutes to run


Solution

  • Regarding

    Note that the 'age' of the folders does not match their names.

    I like to recommend to streamline the folder access and modification times with the names so that simple OS functions or Ansible modules like stat could come in place. Such will make any processing a lot easier.

    How to do that? I have a somehow similar use case of Change creation time of files (RPM) from download time to build time which shows the idea and how one could achieve that.


    Given some test directories as input

    :~/test$ tree 202*
    20221231_110000
    20221231_120000
    20221231_130000
    20221232_000000
    20230000_000000
    20230101_000000
    20230101_010000
    20230101_020000
    20230101_030000
    20230101_120000
    20230101_130000
    

    a minimal example playbook

    ---
    - hosts: localhost
      become: false
      gather_facts: false
    
      vars:
    
        FROM: "20221231_120000"
        TO: "20230101_120000"
    
      tasks:
    
      - name: Get an unordered list of directories with pattern 'yyyymmdd_hhmmss'
        find:
          path: "/home/{{ ansible_user }}/test/"
          file_type: directory
          use_regex: true
          patterns: "^[1-2]{1}[0-9]{7}_[0-9]{6}" # can be more specified
        register: result
    
      - name: Order list
        set_fact:
          dir_list: "{{ result.files | map(attribute='path') | map('basename') | community.general.version_sort }}"
    
      - name: Show directories between
        debug:
          msg: "{{ item }}"
        when: item is version(FROM, '>=') and item is version(TO, '<=') # means between
        loop: "{{ dir_list }}"
    

    will result into an output of

    TASK [Get a unordered list of directories with pattern 'yyyymmdd_hhmmss'] ******
    ok: [localhost]
    
    TASK [Order list] **************************
    ok: [localhost]
    
    TASK [Show directories between] ************
    ok: [localhost] => (item=20221231_120000) =>
      msg: '20221231_120000'
    ok: [localhost] => (item=20221231_130000) =>
      msg: '20221231_130000'
    ok: [localhost] => (item=20221232_000000) =>
      msg: '20221232_000000'
    ok: [localhost] => (item=20230000_000000) =>
      msg: '20230000_000000'
    ok: [localhost] => (item=20230101_000000) =>
      msg: '20230101_000000'
    ok: [localhost] => (item=20230101_010000) =>
      msg: '20230101_010000'
    ok: [localhost] => (item=20230101_020000) =>
      msg: '20230101_020000'
    ok: [localhost] => (item=20230101_030000) =>
      msg: '20230101_030000'
    ok: [localhost] => (item=20230101_120000) =>
      msg: '20230101_120000'
    

    Some measurement

    Get an unordered list of directories with pattern 'yyyymmdd_hhmmss' -- 0.50s
    Show directories between --------------------------------------------- 0.24s
    Order list ----------------------------------------------------------- 0.09s
    

    According the given initial description there is no timezone and daylight saving time involved. So this is working because the given pattern is just a kind of incrementing number, even if a human may interpret it as date. It could even be simplified if more information regarding the hour is provided. Means, if it is every time 1200 that insignificant part could be dropped and leaving one with a simple integer number. The same would be true for the delimiter _.


    Regarding

    ... they are incredibly slow, because it runs the find command for every sequential number ... with_sequence ...

    that is not necessary and seems for me like the case of How do I optimize performance of Ansible playbook with regards to SSH connections?

    Looping over commands and providing one parameter for the command per run results into a lot of overhead and multiple SSH connections as well, providing the list directly to the command might be possible and increase performance and decrease runtime and resource consumption.

    Further processing can be done just afterwards.