Search code examples
loopsansible

Slow Ansible performance when using loop with large YAML var


Hello Developer Community!

I have been working on developing some Ansible playbooks to manage Citrix NetScaler configuration and would like to get some help about the following. I have the following data structure defined in a variable named nsapp_lb_server:

nsapp_lb_server:
    - name:                      "SRV-1"
      ipaddress:                 "10.102.102.1"
      comment:                   "Chewbacca"

    - name:                      "SRV-2"
      ipaddress:                 "10.102.102.2"
      comment:                   "C-3PO"

    - name:                      "SRV-3"
      ipaddress:                 "10.102.102.3"
      comment:                   "Obi-Wan Kenobi"
...

[+ another 1200 item...]

and I have the follow task:

  - name: "Check variables (loop)"
    ansible.builtin.assert:
        that:
            - ( (item.name is defined) and (item.name | length > 0) )
            - ( (item.ipaddress is defined) and (item.ipaddress | ipaddr() == item.ipaddress) )
            - ( (item.comment | length > 0) if (item.comment is defined) else omit )
    loop: "{{ nsapp_lb_server }}"

My problem is that, when I have thousands of records in the nsapp_lb_server variable, the loop is incredibly slow. The task finishes in 30 minutes, which is a very long time... :-(

After some digging on the Internet, it seems, the issue is caused by Ansible "loop" function, so I would like to check if there are any other methods what I can use instead of loop.

Are there any alternatives of Ansible "loop" which can provide the same result (looping over the entries of the variable)? I was thinking about using json_query, but still do not know how to implement it in this specific case.

My environment:

$ ansible --version
ansible [core 2.12.6]
  config file = /home/ansible/.ansible.cfg
  configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/ansible/lib/python3.9/site-packages/ansible
  ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/ansible/ansible/bin/ansible
  python version = 3.9.7 (default, Sep 21 2021, 00:13:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
  jinja version = 3.0.3
  libyaml = True

Could anyone please point me to the right direction? I have been working on my code set for a very long time and after testing the code with large data, the code seems to be useless, because of the running time. I have also checked the hardware resource allocated to the VM where Ansible controller is running on, nothing problematic.

Many thanks in advance!


Solution

  • Running this validation as thousands of individual tasks is very slow because it adds a lot of execution and callback overhead. You can instead do it in a single task, with the caveat that it will be harder to track down the invalid list item(s):

    - hosts: localhost
      gather_facts: false
      vars:
        nsapp_lb_server: "{{ nsapp_lb_samples * 10000 }}"
        nsapp_lb_samples:
            - name:                      "SRV-1"
              ipaddress:                 "10.102.102.1"
              comment:                   "Chewbacca"
            - name:                      "SRV-2"
              ipaddress:                 "10.102.102.2"
              comment:                   "C-3PO"
            - name:                      "SRV-3"
              ipaddress:                 "10.102.102.3"
              comment:                   "Obi-Wan Kenobi"
      tasks:
        - assert:
            that:
              - nsapp_lb_server | rejectattr('name') | length == 0
              - (nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr')) == (nsapp_lb_server | map(attribute='ipaddress'))
              - nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') | length == 0
    

    This runs in ~5 seconds for the 30,000 test entries I fed it.

    To make it easier to find the bad values without making the task extremely ugly, you can split it up into a series of tasks:

    - hosts: localhost
      gather_facts: false
      vars:
        nsapp_lb_server: "{{ nsapp_lb_samples * 10000 }}"
        nsapp_lb_samples:
            - name:                      "SRV-1"
              ipaddress:                 "10.102.102.1"
              comment:                   "Chewbacca"
            - name:                      "SRV-2"
              ipaddress:                 "10.102.102.2"
              comment:                   "C-3PO"
            - name:                      "SRV-3"
              ipaddress:                 "10.102.102.3"
              comment:                   "Obi-Wan Kenobi"
      tasks:
        - name: Check for missing names
          assert:
            that: nsapp_lb_server | rejectattr('name', 'defined') | length == 0
            fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('name', 'defined') }}"
    
        - name: Check for bad names
          assert:
            that: nsapp_lb_server | rejectattr('name') | length == 0
            fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('name') }}"
    
        - name: Check for missing IP addresses
          assert:
            that: nsapp_lb_server | rejectattr('ipaddress', 'defined') | length == 0
            fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('ipaddress', 'defined') }}"
    
        - name: Check for bad IP addresses
          assert:
            that: (nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr')) == (nsapp_lb_server | map(attribute='ipaddress'))
            fail_msg: "Suspicious values: {{ nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr') | symmetric_difference(nsapp_lb_server | map(attribute='ipaddress')) }}"
    
        - name: Check for bad comments
          assert:
            that: nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') | length == 0
            fail_msg: "Bad entries: {{ nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') }}"
    

    This runs in ~12 seconds for the same list of 30,000 test entries.