I have an aks setup in a Hub-Spoke Setup. The cluster itself is in a subnet in a 10.0.0.0/8 vnet and uses kubenet with 192.168.0.0/16. The outbound type is configured with userDefinedRouting. The default route of the subnet forwards all traffic to the NVA in the hub. However, I would like to keep all subnet specific traffic internal (i.e., not route it over the Hub), while routing all other traffic, including the one for other subnets of the vnet to the hub.
Unfortunately, I'm only able to create the cluster successfully if the only route in the custom route table is 0.0.0.0/0 to NVA and nothing else. If the additional routes are there, it fails.
I simplified the problem with below terraform setup. In the variables.tf
a pre-existing vnet inkluding subnet and some details must be passed in. The local variable enable_extended_custom_routes
in the main.tf
creates the routes to keep subnet specific traffic internal while routing traffic with destination vnet to the NVA. The setup fails when this variable is set to true
but works fine when it's set to false
.
############ main.tf ###################
locals {
enable_extended_custom_routes = true
}
resource "azurerm_resource_group" "main" {
name = var.resource_group_name
location = var.location
tags = var.tags
}
#########################
# Network setup
#########################
data "azurerm_resource_group" "vnet" {
name = var.virtual_network_resource_group
}
data "azurerm_virtual_network" "main" {
name = var.vnet_name
resource_group_name = data.azurerm_resource_group.vnet.name
}
data "azurerm_subnet" "aks" {
name = var.subnet_name
resource_group_name = data.azurerm_virtual_network.main.resource_group_name
virtual_network_name = data.azurerm_virtual_network.main.name
}
resource "azurerm_user_assigned_identity" "aks" {
name = "mi-aks"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
tags = var.tags
}
resource "azurerm_role_assignment" "aks_network" {
scope = data.azurerm_resource_group.vnet.id
role_definition_name = "Contributor"
principal_id = azurerm_user_assigned_identity.aks.principal_id
}
resource "azurerm_route_table" "custom_route_table" {
name = "aks-rt"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
tags = var.tags
}
resource "azurerm_role_assignment" "route_table_contributor" {
scope = azurerm_route_table.custom_route_table.id
role_definition_name = "Contributor"
principal_id = azurerm_user_assigned_identity.aks.principal_id
}
resource "azurerm_subnet_route_table_association" "aks" {
depends_on = [azurerm_route.default, azurerm_route.keep-subnet-traffic-within-vnet, azurerm_route.route-vnet-traffic-to-nva]
subnet_id = data.azurerm_subnet.aks.id
route_table_id = azurerm_route_table.custom_route_table.id
}
resource "azurerm_route" "default" {
name = "default"
resource_group_name = azurerm_resource_group.main.name
route_table_name = azurerm_route_table.custom_route_table.name
address_prefix = "0.0.0.0/0"
next_hop_type = "VirtualAppliance"
next_hop_in_ip_address = var.nva_ip_address
}
resource "azurerm_route" "keep-subnet-traffic-within-vnet" {
count = length(data.azurerm_subnet.aks.address_prefixes) * (local.enable_extended_custom_routes ? 1 : 0)
name = "keep-subnet-traffic-within-vnet-${count.index}"
resource_group_name = azurerm_resource_group.main.name
route_table_name = azurerm_route_table.custom_route_table.name
address_prefix = data.azurerm_subnet.aks.address_prefixes[count.index]
next_hop_type = "VnetLocal"
}
resource "azurerm_route" "route-vnet-traffic-to-nva" {
for_each = { for i, address_space in data.azurerm_virtual_network.main.address_space : i => address_space if local.enable_extended_custom_routes && !contains(data.azurerm_subnet.aks.address_prefixes, address_space) }
name = "route-vnet-traffic-to-nva-${each.key}"
resource_group_name = azurerm_resource_group.main.name
route_table_name = azurerm_route_table.custom_route_table.name
address_prefix = each.value
next_hop_type = "VirtualAppliance"
next_hop_in_ip_address = var.nva_ip_address
}
resource "azurerm_kubernetes_cluster" "aks" {
depends_on = [ azurerm_role_assignment.aks_network, azurerm_subnet_route_table_association.aks, azurerm_role_assignment.route_table_contributor]
name = "my-cluster"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
node_resource_group = "my-cluster-nodes"
dns_prefix = "my-cluster-prefix"
private_cluster_enabled = true
private_cluster_public_fqdn_enabled = true
private_dns_zone_id = "None"
sku_tier = "Standard"
automatic_channel_upgrade = "patch"
kubernetes_version = "1.29.4"
tags = var.tags
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.aks.id]
}
default_node_pool {
name = "system"
temporary_name_for_rotation = "systemp"
vm_size = "Standard_D4as_v5"
vnet_subnet_id = data.azurerm_subnet.aks.id
os_sku = "Ubuntu"
zones = [1, 2, 3]
enable_auto_scaling = true
min_count = 1
max_count = 2
only_critical_addons_enabled = true
tags = var.tags
upgrade_settings {
max_surge = "10%"
}
}
network_profile {
network_plugin = "kubenet"
service_cidr = "192.168.128.0/17"
dns_service_ip = "192.168.128.10"
outbound_type = "userDefinedRouting"
load_balancer_sku = "standard"
}
monitor_metrics {}
}
terraform {
required_version = ">= 1.3"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "3.97.1"
}
}
}
provider "azurerm" {
use_oidc = true
subscription_id = var.subscription_id
features {}
}
########## variables.tf #####################
variable "subscription_id" {
type = string
description = "ID of the subscription to deploy to."
}
variable "virtual_network_resource_group" {
type = string
description = "Resource group of the vnet."
default = "rg-aks-vnet-test"
}
variable "vnet_name" {
type = string
description = "Name for vnet."
default = "vnet_aks_test"
}
variable "subnet_name" {
type = string
description = "Name for AKS subnet."
default = "testsnet"
}
variable "tags" {
description = "Tags which should be added."
type = object({
tag1= string
})
default = {
tag1= "string"
}
}
variable "resource_group_name" {
type = string
description = "Name of the resource group."
default = "aks_rg_test"
}
variable "location" {
type = string
description = "The Azure region in which all resources should be provisioned."
default = "australiaeast"
}
variable "nva_ip_address" {
type = string
description = "IP address of the NVA."
default = "1.2.3.4"
}
Now my question: Why does this not work? I could not find any restrictions documented on more routes being not allowed in a custom setup, just the requirement for the default (0.0.0.0/0) route. Am I overlooking something?
Edit: There are no logs. The provisioning fails due to a timeout, System Node is not created successfully (remains in Creating State).
Edit 2: The problem is the route to keep subnet traffic internal (identified by testing). Still no solution.
As above is a private cluster, which is indicated by
private_cluster_enabled = true
a private endpoint is created.
Unfotunately, when the following feature was introduced: https://azure.microsoft.com/en-us/updates/general-availability-of-user-defined-routes-support-for-private-endpoints/
the rule of "most specific route matches" does no longer apply. The private endpoint for the cluster creates a /32 route directly to the API endpoint, circumventing usual routing. However, this route is invalidated as soon as any other route on the same routing table overlapps with the address space. As an effect, the private endpoint becomes unreachable as no custom route can be made to access it.
To solve it, the feature must be disabled on the subnet settings (PrivateEndpointNetworkPolicies). Further details in the docs: https://learn.microsoft.com/en-us/azure/private-link/disable-private-endpoint-network-policy?tabs=network-policy-portal