Search code examples
azureterraformazure-aks

Terraform forces AKS node pool replacement without any changes


I have the following resource definition for additional node pools in my k8s cluster:

resource "azurerm_kubernetes_cluster_node_pool" "extra" {
  for_each = var.node_pools

  kubernetes_cluster_id   = azurerm_kubernetes_cluster.k8s.id
  name                    = each.key
  vm_size                 = each.value["vm_size"]
  node_count              = each.value["count"]
  node_labels             = each.value["labels"]
  vnet_subnet_id          = var.subnet.id
}

Here is the output from terraform plan:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # module.aks.azurerm_kubernetes_cluster_node_pool.extra["general"] has been changed
  ~ resource "azurerm_kubernetes_cluster_node_pool" "extra" {
      + availability_zones     = []
        id                     = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01/agentPools/general"
        name                   = "general"
      + node_taints            = []
      + tags                   = {}
        # (18 unchanged attributes hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.aks.azurerm_kubernetes_cluster_node_pool.extra["general"] must be replaced
-/+ resource "azurerm_kubernetes_cluster_node_pool" "extra" {
      - availability_zones     = [] -> null
      - enable_auto_scaling    = false -> null
      - enable_host_encryption = false -> null
      - enable_node_public_ip  = false -> null
      ~ id                     = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01/agentPools/general" -> (known after apply)
      ~ kubernetes_cluster_id  = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01" -> "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourceGroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01" # forces replacement
      - max_count              = 0 -> null
      ~ max_pods               = 30 -> (known after apply)
      - min_count              = 0 -> null
        name                   = "general"
      - node_taints            = [] -> null
      ~ orchestrator_version   = "1.20.7" -> (known after apply)
      ~ os_disk_size_gb        = 128 -> (known after apply)
      - tags                   = {} -> null
        # (9 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

As you can see, terraform tries to force replacement of my node pool because of a change in kubernetes_cluster_id, even though there is actually no change at all on this value. I've been able to work around this by ignoring kubernetes_cluster_id changes in the lifecycle block, but I am still puzzled as to why terraform detects a change there.

So why does Terraform find a change in this case while there are none?


Solution

  • I have fixed this weird bug by introducing lifecycle block as follows:

    resource "azurerm_kubernetes_cluster_node_pool" "my-node-pool" {
      name = "mynodepool"
      kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
    
      ...
    
      lifecycle {
        ignore_changes = [
          kubernetes_cluster_id
        ]
      }
    } 
    

    Not the cleanest way, but it works. Cluster Id should not be changed unless you recreate whole AKS Cluster, so it should be safe.