Search code examples
azurekubernetesazure-akscnikeda

KEDA scaler not working on AKS with trigger authentication using pod identity


KEDA scaler not scales with scaled object defined with trigger using pod identity for authentication for service bus queue. I'm following this KEDA service bus triggered scaling project.
The scaling works fine with the connection string, but when I try to scale using the pod identity for KEDA scaler the keda operator fails to get the azure identity bound to it with the following keda operator error message log:

github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).isScaledObjectActive
        /workspace/pkg/scaling/scale_handler.go:228
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
        /workspace/pkg/scaling/scale_handler.go:211
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
        /workspace/pkg/scaling/scale_handler.go:145
2021-10-10T17:35:53.916Z        ERROR   azure_servicebus_scaler error   {"error": "failed to refresh token, error: adal: Refresh request failed. Status Code = '400'. Response body: {\"error\":\"invalid_request\",\"error_description\":\"Identity not found\"}\n"}

Edited on 11/09/2021 I opened a github issue at keda, and we did some troubleshoot. But it seems like an issue with AAD Pod Identity as @Tom suggests. The AD Pod Identity MIC pod gives logs like this:

E1109 03:15:34.391759       1 mic.go:1111] failed to update user-assigned identities on node aks-agentpool-14229154-vmss (add [2], del [0], update[0]), error: failed to update identities for aks-agentpool-14229154-vmss in MC_Arun_democluster_westeurope, error: compute.VirtualMachineScaleSetsClient#Update: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' with object id 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' has permission to perform action 'Microsoft.Compute/virtualMachineScaleSets/write' on scope '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourceGroups/MC_Arun_democluster_westeurope/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool-14229154-vmss'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourcegroups/arun/providers/microsoft.managedidentity/userassignedidentities/autoscaler-id' or the linked scope(s) are invalid."

Any clues how to fix it?

My scaler objects' definition is as below:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-service-bus-orders
spec:
  podIdentity:
    provider: azure
---
apiVersion: keda.sh/v1alpha1 
kind: ScaledObject
metadata:
  name: order-scaler
spec:
  scaleTargetRef:
    name: order-processor
  # minReplicaCount: 0 Change to define how many minimum replicas you want
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: demodemobus
      queueName: orders
      messageCount: '5'
    authenticationRef:
      name: trigger-auth-service-bus-orders

Im deploying the azure identity to the namespace keda where my keda deployment resides. And installs KEDA with the following command to set the pod identity binding using helm:

helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=app-autoscaler --namespace keda

Expected Behavior The KEDA scaler should have worked fine with the assigned pod identity and access token to perform scaling

Actual Behavior The KEDA operator could not be able to find the azure identity assigned and scaling fails

Scaler Used Azure Service Bus

Steps to Reproduce the Problem

  1. Create the azure identity and bindings for the KEDA
  2. Install KEDA with the aadpodidentitybinding
  3. Create the scaledobject and triggerauthentication using KEDA pod identity
  4. The scaler fails to authenticate and scale

Solution

  • First and foremost, I am using AKS with kubenet plugin.

    By default 'AAD Pod Identity is disabled by default on Clusters with Kubenet starting from release v1.7.'

    This is because of the Kubenet is vulnerable to ARP Spoofing. Please read it here.

    Even then you can have a workaround to enable the KEDA scaling in Kubenet powered AKS.(The script holds good for other CNI's also, except that you dont need to edit anything with the aad-pod-identity component nmi daemonset definition yaml, if it runs well with your cluster plugins.).

    Below I'm adding an e2e script for the same. Please visit the github issue for access to all the discussions.

    # Define aks name and resource group
    $aksResourceGroup = "K8sScalingDemo"
    $aksName = "K8sScalingDemo"
    
    # Create resource group
    az group create -n $aksResourceGroup -l centralindia
    
    # Create the aks cluster with default kubenet plugin
    az aks create -n $aksName -g $aksResourceGroup
    
    # Resourcegroup where the aks resources will be deployed
    $resourceGroup = "$(az aks show -g $aksResourceGroup -n $aksName --query nodeResourceGroup -otsv)"
    
    # Set the kubectl context to the newly created aks cluster
    az aks get-credentials -n $aksName -g $aksResourceGroup
    
    # Install AAD Pod Identity into the aad-pod-identity namespace using helm
    kubectl create namespace aad-pod-identity
    helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
    helm install aad-pod-identity aad-pod-identity/aad-pod-identity --namespace aad-pod-identity
    
    # Check the status of installation 
    kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=mic"
    kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=nmi"
    
    # the nmi components will Crashloop, ignore them for now. We will make them right later
    
    # Get Resourcegroup Id of our $ResourceGroup
    $resourceGroup_ResourceId = az group show --name $resourceGroup --query id -otsv
    
    # Get the aks cluster kubeletidentity client id
    $aad_pod_identity_clientid = az aks show -g $aksResourceGroup -n $aksName --query identityProfile.kubeletidentity.clientId -otsv
    
    # Assign required roles for cluster over the resourcegroup
    az role assignment create --role "Managed Identity Operator" --assignee $aad_pod_identity_clientid  --scope $resourceGroup_ResourceId
    az role assignment create --role "Virtual Machine Contributor" --assignee $aad_pod_identity_clientid  --scope $resourceGroup_ResourceId
    
    # Create autoscaler azure identity and get client id and resource id of the autoscaler identity
    $autoScaleridentityName = "autoscaler-aad-identity"
    az identity create --name $autoScaleridentityName  --resource-group $resourceGroup
    $autoscaler_aad_identity_clientId = az identity show --name $autoScaleridentityName  --resource-group $resourceGroup --query clientId -otsv
    $autoscaler_aad_identity_resourceId = az identity show --name $autoScaleridentityName  --resource-group $resourceGroup --query id -otsv
    
    # Create the app azure identity and get client id and resource id of the app identity
    $appIdentityName = "app-aad-identity"
    az identity create --name app-aad-identity --resource-group $resourceGroup
    $app_aad_identity_clientId = az identity show --name $appIdentityName --resource-group $resourceGroup --query clientId -otsv
    $app_aad_identity_resourceId = az identity show --name $appIdentityName --resource-group $resourceGroup --query id -otsv
    
    # Create service bus and queue
    $servicebus = 'svcbusdemo'
    az servicebus namespace create --name $servicebus --resource-group $resourceGroup --sku basic
    $servicebus_namespace_resourceId = az servicebus namespace show --name $servicebus --resource-group $resourceGroup --query id -otsv
    
    az servicebus queue create --namespace-name $servicebus --name orders --resource-group $resourceGroup
    $servicebus_queue_resourceId = az servicebus queue show --namespace-name $servicebus --name orders --resource-group $resourceGroup --query id -otsv
    
    # Assign Service Bus Data Receiver role to the app identity created
    az role assignment create --role 'Azure Service Bus Data Receiver' --assignee $app_aad_identity_clientId  --scope $servicebus_queue_resourceId
    
    # Create a namespace for order app deployment
    kubectl create namespace keda-dotnet-sample
    
    # Create a yaml deployment configuration variable
    $app_with_identity_yaml= @"
    apiVersion: aadpodidentity.k8s.io/v1
    kind: AzureIdentity
    metadata:
      name: $appIdentityName
      annotations:
        aadpodidentity.k8s.io/Behavior: namespaced
    spec:
      type: 0 # 0 means User-assigned MSI
      resourceID: $app_aad_identity_resourceId
      clientID: $app_aad_identity_clientId
    ---
    apiVersion: aadpodidentity.k8s.io/v1
    kind: AzureIdentityBinding
    metadata:
      name: $appIdentityName-binding
    spec:
      azureIdentity: $appIdentityName
      selector: order-processor
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: order-processor
      labels:
        app: order-processor
    spec:
      selector:
        matchLabels:
          app: order-processor
      template:
        metadata:
          labels:
            app: order-processor
            aadpodidbinding: order-processor
        spec:
          containers:
          - name: order-processor
            image: ghcr.io/kedacore/sample-dotnet-worker-servicebus-queue:latest
            env:
            - name: KEDA_SERVICEBUS_AUTH_MODE
              value: ManagedIdentity
            - name: KEDA_SERVICEBUS_HOST_NAME
              value: $servicebus.servicebus.windows.net
            - name: KEDA_SERVICEBUS_QUEUE_NAME
              value: orders
            - name: KEDA_SERVICEBUS_IDENTITY_USERASSIGNEDID
              value: $app_aad_identity_clientId
    "@
    
    # Create the app deployment with identity bindings using kubectl apply
    $app_with_identity_yaml | kubectl apply --namespace keda-dotnet-sample -f -
    
    # Now the order processor app works with the pod identity and 
    # processes the queues 
    # You can refer the [project ](https://github.com/kedacore/sample-dotnet-worker-servicebus-queue/blob/main/pod-identity.md) for that.
    
    # Now start installation of KEDA in namespace keda-system
    
    kubectl create namespace keda-system
    
    # Create a pod identity and binding for autoscaler azure identity
    $autoscaler_yaml =@"
    apiVersion: aadpodidentity.k8s.io/v1
    kind: AzureIdentity
    metadata:
      name: $autoScaleridentityName
    spec:
      type: 0 # 0 means User-assigned MSI
      resourceID: $autoscaler_aad_identity_resourceId
      clientID: $autoscaler_aad_identity_clientId
    ---
    apiVersion: aadpodidentity.k8s.io/v1
    kind: AzureIdentityBinding
    metadata:
      name: $autoScaleridentityName-binding
    spec:
      azureIdentity: $autoScaleridentityName
      selector: $autoScaleridentityName
    "@
    $autoscaler_yaml | kubectl apply --namespace keda-system -f -
    
    # Install KEDA using helm
    helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=autoscaler-aad-identity --namespace keda-system
    
    # Assign Service Bus Data Owner role to keda autoscaler identity
    az role assignment create --role 'Azure Service Bus Data Owner' --assignee $autoscaler_aad_identity_clientId --scope $servicebus_namespace_resourceId
    
    # Apply scaled object definition and trigger authentication provider as `azure`
    $aap_autoscaling_yaml = @"
    apiVersion: keda.sh/v1alpha1
    kind: TriggerAuthentication
    metadata:
      name: trigger-auth-service-bus-orders
    spec:
      podIdentity:
        provider: azure
    ---
    apiVersion: keda.sh/v1alpha1 
    kind: ScaledObject
    metadata:
      name: order-scaler
    spec:
      scaleTargetRef:
        name: order-processor
      # minReplicaCount: 0 Change to define how many minimum replicas you want
      maxReplicaCount: 10
      triggers:
      - type: azure-servicebus
        metadata:
          namespace: $servicebus
          queueName: orders
          messageCount: '5'
        authenticationRef:
          name: trigger-auth-service-bus-orders
    "@
    
    $aap_autoscaling_yaml | kubectl apply --namespace keda-dotnet-sample -f -
    
    # Now the Keda is getting 401 unauthorized error as the AAD Pod Identity comnponent `nmi` is not runnig on the system
    # To fix it edit the daemonset for `nmi` component
    # add the container arg `--allow-network-plugin-kubenet=true` by editing the `daemonset.apps/aad-pod-identity-nmi`
    kubectl edit daemonset.apps/aad-pod-identity-nmi -n aad-pod-identity
    
    # the containe arg section should look like this after editing:
        spec:
          containers:
          - args:
            - --node=$(NODE_NAME)
            - --http-probe-port=8085
            - --enableScaleFeatures=true
            - --metadata-header-required=true
            - --operation-mode=standard
            - --kubelet-config=/etc/default/kubelet
            - --allow-network-plugin-kubenet=true
            env:
    
    # Now the KEDA is authenticated by aad-pod-identity metadata endpoint and the orderapp should scale up 
    # with the queue counts
    # If the order app still falls back to errors please delete and redeploy it.
    # And that's it you just scaled your app up using KEDA on Kubenet AKS cluster.
    
    Note: Read this instruction before you run AAD Identity On a Kubenet powered AKS.