Search code examples
pythonansibleansible-runner

Ansible Runner consecutive calls mess up when done too fast


I have made a piece of software using the official Ansible Runner libraries that receives several remote calls to run 1 or N times 1 or M playbooks... The Ansible run config is sequential, although this should not be relevant for different calls (if I understand right, it just configures the tasks inside the same playbook run)

So, I run the playbooks using Ansible Runner's run_async():

runner_async_thread, runner_object = ansible_runner.run_async(
                **{k: v for k, v in kwargs.items() if v is not None})

and keep a loop the asynchronous thread's is_alive() method, checking for other conditions

while runner_async_thread.is_alive():
    ...

If an exception is raised, or after the thread finishes, I just check the status result and return.

The issue is that, when the system receives a lot of calls together, it messes up, and I get errors such as this one:

The offending line appears to be:


{"username": "operator", "password": "!", "target": "sever_003_linux"}05_linux"}
                                                                      ^ here
We could be wrong, but this one looks like it might be an issue with
unbalanced quotes. If starting a value with a quote, make sure the
line ends with the same set of quotes. For instance this arbitrary
example:

    foo: "bad" "wolf"

Could be written as:

    foo: '"bad" "wolf"'

The error is obviously this:

    {"username": "new_user", "target": "sever_003_linux"}05_linux"}

I doble check (logs and env/extravars files), but the sent commands are right:

{"username": "new_user", "target": "sever_003_linux"}

So, it seems a memory area is being overwritten without been cleaned, could be 2 runners running together (it seems it is possible) without Thread Safety? Do you have some idea about how to fix this or a way to prevent it from happening, please?

The code normally worked, the same calls worked when using some delays, but I don't think it is an ideal solution...

I was playing with Ansible config, but no way.

ansible 2.9.6
python version = 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0]

Solution

  • I found more people reporting about this issue in this Jira story: https://jira.opencord.org/browse/CORD-922

    Ansible, when used via its API, is not thread-safe.

    They also propose an idea about how to overcome this problem:

    To be safe and avoid such issues, we will wrap invocations of Ansible in a process by invoking a fork() before using it.

    But, in my case, I have to return the result of the operation to report it. Therefore, I declare a shared queue in order to communicate the processes, and I fork the main one.

    import ansible_runner
    from multiprocessing import Queue
    import os
    
    #...
    
    def playbook_run(self, parameters):
        #...
        runner_async_thread, runner_object = ansible_runner.run_async(
                        **{k: v for k, v in kwargs.items() if v is not None})
        while runner_async_thread.is_alive():
            #...
        return run_result
    
    
    shared_queue = Queue()
    process_pid = os.fork()
    if process_pid == 0:  # the forked child process will independently run & report
        run_result = self.playbook_run(playbook_name,
                                       parameters)
        shared_queue.put(run_result)
        shared_queue.close()
        shared_queue.join_thread()
        os._exit(0)
    else:  # the parent process will wait until it gets the report
        run_result = shared_queue.get()
        return run_result
    

    And, assuming that the lack of thread safety was the issue, problem solved.

    As I think it was not reported, I opened an issue in the Ansible Runner developers GitHub: https://github.com/ansible/ansible-runner/issues/808