Search code examples
azurekubernetesterraformazure-aksterraform-provider-azure

AKS with user defined routes


I have an aks setup in a Hub-Spoke Setup. The cluster itself is in a subnet in a 10.0.0.0/8 vnet and uses kubenet with 192.168.0.0/16. The outbound type is configured with userDefinedRouting. The default route of the subnet forwards all traffic to the NVA in the hub. However, I would like to keep all subnet specific traffic internal (i.e., not route it over the Hub), while routing all other traffic, including the one for other subnets of the vnet to the hub.

Unfortunately, I'm only able to create the cluster successfully if the only route in the custom route table is 0.0.0.0/0 to NVA and nothing else. If the additional routes are there, it fails.

I simplified the problem with below terraform setup. In the variables.tf a pre-existing vnet inkluding subnet and some details must be passed in. The local variable enable_extended_custom_routes in the main.tf creates the routes to keep subnet specific traffic internal while routing traffic with destination vnet to the NVA. The setup fails when this variable is set to true but works fine when it's set to false.

############ main.tf ###################

locals {
  enable_extended_custom_routes = true
}

resource "azurerm_resource_group" "main" {
  name     = var.resource_group_name
  location = var.location
  tags     = var.tags
}

#########################
# Network setup
#########################
data "azurerm_resource_group" "vnet" {
  name = var.virtual_network_resource_group
}
data "azurerm_virtual_network" "main" {
  name                = var.vnet_name
  resource_group_name = data.azurerm_resource_group.vnet.name
}

data "azurerm_subnet" "aks" {
  name                 = var.subnet_name
  resource_group_name  = data.azurerm_virtual_network.main.resource_group_name
  virtual_network_name = data.azurerm_virtual_network.main.name
}

resource "azurerm_user_assigned_identity" "aks" {
  name                = "mi-aks"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  tags = var.tags
}

resource "azurerm_role_assignment" "aks_network" {
  scope                = data.azurerm_resource_group.vnet.id
  role_definition_name = "Contributor"
  principal_id         = azurerm_user_assigned_identity.aks.principal_id
}

resource "azurerm_route_table" "custom_route_table" {
  name                = "aks-rt"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  tags = var.tags
}

resource "azurerm_role_assignment" "route_table_contributor" {
  scope                = azurerm_route_table.custom_route_table.id
  role_definition_name = "Contributor"
  principal_id         = azurerm_user_assigned_identity.aks.principal_id
}

resource "azurerm_subnet_route_table_association" "aks" {
  depends_on = [azurerm_route.default, azurerm_route.keep-subnet-traffic-within-vnet, azurerm_route.route-vnet-traffic-to-nva]
  subnet_id      = data.azurerm_subnet.aks.id
  route_table_id = azurerm_route_table.custom_route_table.id
}

resource "azurerm_route" "default" {
  name                = "default"
  resource_group_name = azurerm_resource_group.main.name
  route_table_name    = azurerm_route_table.custom_route_table.name
  address_prefix      = "0.0.0.0/0"
  next_hop_type = "VirtualAppliance"
  next_hop_in_ip_address = var.nva_ip_address
}

resource "azurerm_route" "keep-subnet-traffic-within-vnet" {
  count              = length(data.azurerm_subnet.aks.address_prefixes) * (local.enable_extended_custom_routes ? 1 : 0)
  name                = "keep-subnet-traffic-within-vnet-${count.index}"
  resource_group_name = azurerm_resource_group.main.name
  route_table_name    = azurerm_route_table.custom_route_table.name
  address_prefix      = data.azurerm_subnet.aks.address_prefixes[count.index]
  next_hop_type  = "VnetLocal"
}

resource "azurerm_route" "route-vnet-traffic-to-nva" {
  for_each = { for i, address_space in data.azurerm_virtual_network.main.address_space : i => address_space if local.enable_extended_custom_routes && !contains(data.azurerm_subnet.aks.address_prefixes, address_space) }
  name                = "route-vnet-traffic-to-nva-${each.key}"
  resource_group_name = azurerm_resource_group.main.name
  route_table_name    = azurerm_route_table.custom_route_table.name
  address_prefix      = each.value
  next_hop_type  = "VirtualAppliance"
  next_hop_in_ip_address = var.nva_ip_address
}


resource "azurerm_kubernetes_cluster" "aks" {
  depends_on = [ azurerm_role_assignment.aks_network, azurerm_subnet_route_table_association.aks, azurerm_role_assignment.route_table_contributor]

  name                                = "my-cluster"
  location                            = azurerm_resource_group.main.location
  resource_group_name                 = azurerm_resource_group.main.name
  node_resource_group                 = "my-cluster-nodes"
  dns_prefix                          = "my-cluster-prefix"
  private_cluster_enabled             = true
  private_cluster_public_fqdn_enabled = true
  private_dns_zone_id                 = "None"
  sku_tier                            = "Standard"
  automatic_channel_upgrade           = "patch"
  kubernetes_version                  = "1.29.4"
  tags                                = var.tags

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.aks.id]
  }

  default_node_pool {
    name                         = "system"
    temporary_name_for_rotation  = "systemp"
    vm_size                      = "Standard_D4as_v5"
    vnet_subnet_id               = data.azurerm_subnet.aks.id
    os_sku                       = "Ubuntu"
    zones                        = [1, 2, 3]
    enable_auto_scaling          = true
    min_count                    = 1
    max_count                    = 2
    only_critical_addons_enabled = true
    tags = var.tags
    upgrade_settings {
    max_surge = "10%"
  }
  }

  network_profile {
    network_plugin      = "kubenet"
    service_cidr        = "192.168.128.0/17"
    dns_service_ip      = "192.168.128.10"
    outbound_type       = "userDefinedRouting"
    load_balancer_sku   = "standard"
  }
  monitor_metrics {}
  
}

terraform {
  required_version = ">= 1.3"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.97.1"
    }
  }
}

provider "azurerm" {
  use_oidc        = true
  subscription_id = var.subscription_id
  features {}
}
########## variables.tf #####################

variable "subscription_id" {
  type        = string
  description = "ID of the subscription to deploy to."
}

variable "virtual_network_resource_group" {
  type        = string
  description = "Resource group of the vnet."
  default = "rg-aks-vnet-test"
}

variable "vnet_name" {
  type        = string
  description = "Name for vnet."
  default = "vnet_aks_test"
}

variable "subnet_name" {
  type        = string
  description = "Name for AKS subnet."
  default = "testsnet"
}

variable "tags" {
  description = "Tags which should be added."
  type = object({
    tag1= string
  })
  default = {
      tag1= "string"
  }
}

variable "resource_group_name" {
  type        = string
  description = "Name of the resource group."
  default = "aks_rg_test"
}

variable "location" {
  type        = string
  description = "The Azure region in which all resources should be provisioned."
  default = "australiaeast"
}

variable "nva_ip_address" {
  type = string
  description = "IP address of the NVA."
  default = "1.2.3.4"
}

Now my question: Why does this not work? I could not find any restrictions documented on more routes being not allowed in a custom setup, just the requirement for the default (0.0.0.0/0) route. Am I overlooking something?

Edit: There are no logs. The provisioning fails due to a timeout, System Node is not created successfully (remains in Creating State).

Edit 2: The problem is the route to keep subnet traffic internal (identified by testing). Still no solution.


Solution

  • As above is a private cluster, which is indicated by

      private_cluster_enabled             = true
    

    a private endpoint is created.

    Unfotunately, when the following feature was introduced: https://azure.microsoft.com/en-us/updates/general-availability-of-user-defined-routes-support-for-private-endpoints/

    the rule of "most specific route matches" does no longer apply. The private endpoint for the cluster creates a /32 route directly to the API endpoint, circumventing usual routing. However, this route is invalidated as soon as any other route on the same routing table overlapps with the address space. As an effect, the private endpoint becomes unreachable as no custom route can be made to access it.

    To solve it, the feature must be disabled on the subnet settings (PrivateEndpointNetworkPolicies). Further details in the docs: https://learn.microsoft.com/en-us/azure/private-link/disable-private-endpoint-network-policy?tabs=network-policy-portal