Search code examples
dockeransibleansible-awxnutanix

How to troubleshoot API requests sent by Ansible module?


Here's the context:
The following playbook (simplified to one task for this Stack Overflow topic and for ease of use) got different results depends on where I launch it. It uses the nutanix.ncp galaxy collection (tested with version 1.9.0, 1.8.0, 1.7.0...)

The ntnx_subnets_info method is called to retrieve all the list of existing subnet on a prism central instance, and is filtered through the name parameter to retrieve only details of a specific VLAN.

---
- name: test-get-subnet-info
  hosts: localhost

  vars:
    nutanix_host: "{{ XXXXXXX }}"
    nutanix_username: "{{ XXXXXXX }}"
    nutanix_password: "XXXXXXX "

  collections:
    - nutanix.ncp
  module_defaults:
    group/nutanix.ncp.ntnx:
      nutanix_host: "{{ XXXXXXX }}"
      nutanix_username: "{{ XXXXXXX }}"
      nutanix_password: "XXXXXXX "

  tasks:
  - name: Retrieve subnet info
    ntnx_subnets_info:
      filter:
        name: "my-VLAN"

On a Debian 11 server, this task run smoothly as you can see below:

enter image description here

Whereas on my custom AWX EE (tested both with docker then on K8s) I got the following error which isn't too explicit:

Failed to convert API response to json

enter image description here

Troubleshoot steps:

  • try to downgrade nutanix.ncp collection from 1.9.0 to 1.8.0 and 1.7.0 => still the same results (works on debian, not from docker)
  • compare ansible version => both environment run on ansible core 2.15.4
  • compare python version => docker image environment is on 3.9.17 and debian server on 3.9.2
  • launch manual curl requests from both environment => everything works on both environment
  • launch the playbook with -vvvvvv option and compare differences on the log => nutanix.ncp is not so chatty, I don't get any other errors as you can see on the screenshot beyond.

Questions

  • Is there a way to "analyze" Ansible API requests, like a wireshark/fiddler for Ansible ?
  • how can I go further to troubleshoot and fix this issue? As it works on one side but not on the others I may compare some stuff and hopefully find a difference?

Solution

  • After many tests and investigations, just found entity.py in nutanix.ncp collection files. It was the file that is responsible of msg "Failed to convert API response to json"

    • One function sends the error message when URL response received code is >300. I went into a wireshark capture and figured out there was an error in the network flows : enter image description here

    • Then, it bring me to check connection from my docker image / K8s pod to my Prism Central.

      The command openssl s_client -connect fqdn_prism_central:9440 shows the error message

    “ Verify return code: 20 (unable to get local issuer certificate)"

    • Finally, I just update certificate chain on my docker image (through my Dockerfile) and everything is now fine in AWX.

      In my case, as AWX EE image is based on official awx-ee (https://quay.io/repository/ansible/awx-ee?tab=tags&tag=latest), I add the following steps to my dockerfile (note : path and command may differs if you're using something else than CentOS image) :

    COPY ./certificate_chain.pem /etc/pki/ca-trust/source/anchors/certificate_chain.pem
    RUN chmod 644 /etc/pki/ca-trust/source/anchors/certificate_chain.pem && update-ca-trust extract
    

    Pretty tough, but it works great ! :)