Search code examples
azurekubernetesterraformazure-aksazure-cli

aci-connector-linux pod for Azure AKS in CrashLoopBackOff status


I am having an issue when trying to setup Virtual Nodes for Azure Kubernetes cluster using Terraform.

When I check the pod for the aci-connector-linux, I get the below error:

Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Normal   Pulled   41m (x50 over 4h26m)    kubelet  Container image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.1" already present on machine
  Warning  BackOff  68s (x1222 over 4h26m)  kubelet  Back-off restarting failed container

I've also granted the System Assigned identity of the Azure Kubernetes Cluster the required contributor role using the documentation here - https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/examples/kubernetes/aci_connector_linux/main.tf but I'm still getting CrashLoopBackOff status error.


Solution

  • I finally fixed it.

    The issue was caused by the Outdated documentation for aci-connector-linux here - https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/examples/kubernetes/aci_connector_linux/main.tf which assigns the role to the Managed identity of the Azure Kubernetes cluster

    Here's how I fixed it:

    Azure Kubernetes Service creates Node resource group which is separate from the resource group for the Kubernetes Cluster. Within the Node resource group, AKS creates a Managed Identity for the aci-connector-linux. The name of the Node resource group is usually MC_<KubernetesResourceGroupName_KubernetesServiceName-KubernetesResourceGroupLocation>, so if your KubernetesResourceGroupName is MyResourceGroup and if the KubernetesServiceName is my-test-cluster and if the KubernetesResourceGroupLocation westeurope, then the Node resource group will be MC_MyResourceGroup_my-test-cluster_westeurope. You can view the resources in the Azure Portal under Resource Groups.

    Next, you can view the root cause of the issue by viewing the logs of the aci-connector-linux pod using the command:

    kubectl logs aci-connector-linux-577bf54d75-qm9kl -n kube-system
    

    And you will an output like this:

    time="2022-06-29T15:23:38Z" level=fatal msg="error initializing provider azure: error setting up network profile: error while looking up subnet: api call to https://management.azure.com/subscriptions/0237fb7-7530-43ba-96ae-927yhfad80d1/resourcegroups/MyResourceGroup/providers/Microsoft.Network/virtualNetworks/my-vnet/subnets/k8s-aci-node-pool-subnet?api-version=2018-08-01: got HTTP response status code 403 error code "AuthorizationFailed": The client '560df3e9b-9f64-4faf-aa7c-6tdg779f81c7' with object id '560df3e9b-9f64-4faf-aa7c-6tdg779f81c7' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/subnets/read' over scope '/subscriptions/0237fb7-7530-43ba-96ae-927yhfad80d1/resourcegroups/MyResourceGroup/providers/Microsoft.Network/virtualNetworks/my-vnet/subnets/k8s-aci-node-pool-subnet' or the scope is invalid. If access was recently granted, please refresh your credentials."

    You can fix this in Terraform using the code below:

    # Get subnet ID
    data "azurerm_subnet" "k8s_aci" {
      name                 = "k8s-aci-node-pool-uat-subnet"
      virtual_network_name = "sparkle-uat-vnet"
      resource_group_name  = data.azurerm_resource_group.main.name
    }
    
    # Get the Identity of a service principal
    data "azuread_service_principal" "aks_aci_identity" {
      display_name = "aciconnectorlinux-${var.kubernetes_cluster_name}"
      depends_on = [module.kubernetes_service_uat]
    }
    
    # Assign role to aci identity
    module "role_assignment_aci_nodepool_subnet" {
      source = "../../../modules/azure/role-assignment"
    
      role_assignment_scope        = data.azurerm_subnet.k8s_aci.id
      role_definition_name         = var.role_definition_name.net-contrib
      role_assignment_principal_id = data.azuread_service_principal.aks_aci_identity.id
    }
    

    You can also achieve this using the Azure CLI command below:

    az role assignment create --assignee <Object (principal) ID> --role "Network Contributor" --scope <subnet-id>
    

    Note: The Object (principal) ID is the ID that you obtained in the error message.

    An example is this:

    az role assignment create --assignee 560df3e9b-9f64-4faf-aa7c-6tdg779f81c7 --role "Network Contributor" --scope /subscriptions/0237fb7-7530-43ba-96ae-927yhfad80d1/resourcegroups/MyResourceGroup/providers/Microsoft.Network/virtualNetworks/my-vnet/subnets/k8s-aci-node-pool-subnet
    

    Resources:

    Aci connector linux should export the identity associated to its addon

    Using Terraform to create an AKS cluster with "SystemAssigned" identity and aci_connector_linux profile enabled does not result in a creation of a virtual node

    Azure Kubernetes Service Tutorial: How to Integrate AKS with Azure Container Instances

    Fail to configure a load balancer (AKS)