I have a TKG 2.1.1 (kubernetes version 1.24.10) cluster deployed in Azure in a private network that already has an internal load balancer provisioned (by the tanzu installer). When attempting to deploy the istio-ingressgateway, the service is stuck in pending.
Install command:
helm install -f values.yaml istio-ingressgateway istio/gateway -n istio-ingress --wait
values.yaml:
service:
type: LoadBalancer
ports:
- name: status-port
port: 15021
protocol: TCP
targetPort: 15021
- name: http2
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: 'true'
have also attempted to run an upgrade with alterations to the values file. Revision 2:
service:
type: LoadBalancer
ports:
- name: status-port
port: 15021
protocol: TCP
targetPort: 15021
- name: http2
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: 'true'
service.beta.kubernetes.io/azure-load-balancer-ipv4: <existing lb ip>
Revision 3:
service:
type: LoadBalancer
ports:
- name: status-port
port: 15021
protocol: TCP
targetPort: 15021
- name: http2
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: 'true'
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: app-pln-snet
Regardless of the values use the status returns:
helm status istio-ingressgateway -n istio-ingress
NAME: istio-ingressgateway
LAST DEPLOYED: Thu Jun 1 05:23:31 2023
NAMESPACE: istio-ingress
STATUS: failed
REVISION: 3
TEST SUITE: None
NOTES:
"istio-ingressgateway" successfully installed!
And the service looks like:
kubectl describe service istio-ingressgateway -n istio-ingress
Name: istio-ingressgateway
Namespace: istio-ingress
Labels: app=istio-ingressgateway
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=istio-ingressgateway
app.kubernetes.io/version=1.17.2
helm.sh/chart=gateway-1.17.2
istio=ingressgateway
Annotations: meta.helm.sh/release-name: istio-ingressgateway
meta.helm.sh/release-namespace: istio-ingress
service.beta.kubernetes.io/azure-load-balancer-internal: true
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: app-pln-snet
Selector: app=istio-ingressgateway,istio=ingressgateway
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 100.69.48.176
IPs: 100.69.48.176
Port: status-port 15021/TCP
TargetPort: 15021/TCP
NodePort: status-port 32090/TCP
Endpoints: 100.96.1.230:15021
Port: http2 80/TCP
TargetPort: 80/TCP
NodePort: http2 31815/TCP
Endpoints: 100.96.1.230:80
Port: https 443/TCP
TargetPort: 443/TCP
NodePort: https 30364/TCP
Endpoints: 100.96.1.230:443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
kubectl get service istio-ingressgateway -n istio-ingress -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
istio-ingressgateway LoadBalancer 100.69.48.176 <pending> 15021:32090/TCP,80:31815/TCP,443:30364/TCP 42m app=istio-ingressgateway,istio=ingressgateway
expectation is the the istio-ingressgateway would connect to the existing Azure internal lb and get the IP.
So a couple of changes were made from my initial deployment of TKG 2.1.1 and Istio 1.17.2 in order to get this to work. And to fix the issue I was having I had to destroy the TKG workload cluster and rebuild it.
The cluster definition yaml used to deploy the workload cluster needed to be altered. The change made was to comment out the values for creating outbound LB.
...
# AZURE_ENABLE_CONTROL_PLANE_OUTBOUND_LB: true
# AZURE_ENABLE_NODE_OUTBOUND_LB: true
# AZURE_CONTROL_PLANE_OUTBOUND_LB_FRONTEND_IP_COUNT: 1
# AZURE_NODE_OUTBOUND_LB_FRONTEND_IP_COUNT: 1
# AZURE_NODE_OUTBOUND_LB_IDLE_TIMEOUT_IN_MINUTES: 4
...
These values told Tanzu to create both an internal LB and and outbound LB for the compute plane. In the end, when installing Istio 1.17.2 via Helm, the ingress gateway creation was not able to reconcile the internal load balancer for the control plane that had already been generated. Is this case, Istio must be allowed to create the internal LB for compute plane in the cluster so you cannot have TKG do that.
The next aspect of the problem is a mismatch of Azure NSG naming. Because we are deploying into a private cluster configuration on Azure the network, subnets, and nsg already exist. When building it this way Tanzu expects the nsg name for the compute plane snet to be cluster-name-node-nsg
and it must reside in the resource group with the vnet/snets. However when Istio attempts to build the internal LB it is looking for an nsg named cluster-name-id-node-nsg
and fails this check when it doesn't find it.
To reconcile this after the cluster has been generated by TKG you can search for the for the internal LB that is created for the control plane in Azure portal. It will be named cluster-name-id-internal-lb
. You can then create a new nsg named cluster-name-id-node-nsg
with the same id as in the lb resource. The new nsg must be in the same resource group as the vnet. And, you must assign it to the compute plane snet of cluster. This replaces the nsg that was previous set up in the private network in order to install TKG. You also need to ensure it has the same rules and the nsg it is replacing.
Once the new nsg is in place, Istio will create a new LB cluster-name-internal
with the compute plane as the back end and the service will get the private IP. You only need to pass these values in the values.yaml with the helm install from the question for that to work:
service:
type: LoadBalancer
ports:
- name: status-port
port: 15021
protocol: TCP
targetPort: 15021
- name: http2
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: 'true'