I setup a (what I think) is a bog standard EKS cluster using terraform-aws-eks like so:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.0"
cluster_name = "my-test-cluster"
cluster_version = "1.21"
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
eks_managed_node_group_defaults = {
disk_size = 50
instance_types = ["m5.large"]
eks_managed_node_groups = {
green_test = {
min_size = 1
max_size = 2
desired_size = 2
instance_types = ["t3.large"]
capacity_type = "SPOT"
then tried to install Istio via the install docs
istioctl install
which resulted in this:
✔ Istio core installed
✔ Istiod installed
✘ Ingress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition
Deployment/istio-system/istio-ingressgateway (containers with unready status: [istio-proxy])
- Pruning removed resources Error: failed to install manifests: errors occurred during operation
so I did a bit of digging:
kubectl logs istio-ingressgateway-7fd568fc99-6ql8h -n istio-system
led to
2022-04-17T13:51:14.540346Z warn ca ca request failed, starting attempt 1 in 90.275446ms
2022-04-17T13:51:14.631695Z warn ca ca request failed, starting attempt 2 in 195.118437ms
2022-04-17T13:51:14.827286Z warn ca ca request failed, starting attempt 3 in 394.627125ms
2022-04-17T13:51:15.222738Z warn ca ca request failed, starting attempt 4 in 816.437569ms
2022-04-17T13:51:16.039427Z warn sds failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp i/o timeout"
2022-04-17T13:51:33.941084Z warning envoy config StreamAggregatedResources gRPC config stream closed since 318s ago: 14, connection error: desc = "transport: Error while dialing dial tcp i/o timeout"
2022-04-17T13:52:05.830859Z warning envoy config StreamAggregatedResources gRPC config stream closed since 350s ago: 14, connection error: desc = "transport: Error while dialing dial tcp i/o timeout"
2022-04-17T13:52:26.232441Z warning envoy config StreamAggregatedResources gRPC config stream closed since 370s ago: 14, connection error: desc = "transport: Error while dialing dial tcp i/o timeout"
So from a lot of reading, it seems like maybe the istio-ingressgateway pod is not able to connect to istiod?
Google time, I find this: https://istio.io/latest/docs/ops/diagnostic-tools/proxy-cmd/#verifying-connectivity-to-istiod
kubectl create namespace foo
kubectl apply -f <(istioctl kube-inject -f samples/sleep/sleep.yaml) -n foo
kubectl exec $(kubectl get pod -l app=sleep -n foo -o jsonpath={.items..metadata.name}) -c sleep -n foo -- curl -sS istiod.istio-system:15014/version
which gives me:
curl: (7) Failed to connect to istiod.istio-system port 15014 after 4 ms: Connection refused
command terminated with exit code 7
So I think this problem is not specific to the istio-ingressgateway, but a more general networking issue in a standard EKS cluster?
Thanks in advance!
[22-04-18] Update 1:
Ok, so the test with the foo namespace sleep pod leads me to believe that the connection timeout has to do with aws security group rules. The theory is, if security group ports are not opened, you'd see the sort of "connection refused" "io timeout" messages that I see. To test the theory I took the 4 security groups that are created by this module
and opened all traffic up inbound/outbound on all of them.
istioctl install
This will install the Istio 1.13.2 default profile with ["Istio core" "Istiod" "Ingress gateways"] components into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete Making this installation the default for injection and validation.
Et viola! Ok, now I think I need to work backwards and isolate -which- ports and what security group to apply them to, and if they are on the inbound or outbound side. Once I have those, I can PR it back to terraform-aws-eks and save someone else hours of headache.
[22-04-22] Update 2:
Ultimately, I solved this issue - but ran into one more Very Common problem that I saw many others ran into, and had the answer for, but not in a usable format for the terraform-aws-eks module.
After I was able to get the istioctl install to work correctly:
istioctl install --set profile=demo
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete Making this installation the default for injection and validation.
kubectl label namespace default istio-injection=enabled
kubectl apply -f istio-1.13.2/samples/bookinfo/platform/kube/bookinfo.yaml
I saw all the bookinfo pods/deployments fail to start with this:
Internal error occurred: failed calling
webhook "namespace.sidecar-injector.istio.io": failed to
call webhook: Post "https://istiod.istio-system.svc:443
/inject?timeout=10s": context deadline exceeded
The answer to the is problem is similar to the original problem: working fw ports / security group rules. I've added a separate answer below for clarity. It contains a complete working solution of AWS-EKS + Terraform + Istio
BLUF: Installing Istio on terraform-aws-eks requires you to add security group rules allowing communication within the node group. You need:
failed calling webhook "namespace.sidecar-injector.istio.io"
error.Unfortunately, I still don't know why this works since I don't yet understand the order of operations that happens when an istio injected pod comes up in a kubernetes cluster, and who tries to talk to who.
Please see the comments for which sets of rules solves which of the two problems from the original answer
# Ports needed to correctly install Istio for the error message: transport: Error while dialing dial tcp xx.xx.xx.xx15012: i/o timeout
locals {
istio_ports = [
description = "Envoy admin port / outbound"
from_port = 15000
to_port = 15001
description = "Debug port"
from_port = 15004
to_port = 15004
description = "Envoy inbound"
from_port = 15006
to_port = 15006
description = "HBONE mTLS tunnel port / secure networks XDS and CA services (Plaintext)"
from_port = 15008
to_port = 15010
description = "XDS and CA services (TLS and mTLS)"
from_port = 15012
to_port = 15012
description = "Control plane monitoring"
from_port = 15014
to_port = 15014
description = "Webhook container port, forwarded from 443"
from_port = 15017
to_port = 15017
description = "Merged Prometheus telemetry from Istio agent, Envoy, and application, Health checks"
from_port = 15020
to_port = 15021
description = "DNS port"
from_port = 15053
to_port = 15053
description = "Envoy Prometheus telemetry"
from_port = 15090
to_port = 15090
description = "aws-load-balancer-controller"
from_port = 9443
to_port = 9443
ingress_rules = {
for ikey, ivalue in local.istio_ports :
"${ikey}_ingress" => {
description = ivalue.description
protocol = "tcp"
from_port = ivalue.from_port
to_port = ivalue.to_port
type = "ingress"
self = true
egress_rules = {
for ekey, evalue in local.istio_ports :
"${ekey}_egress" => {
description = evalue.description
protocol = "tcp"
from_port = evalue.from_port
to_port = evalue.to_port
type = "egress"
self = true
# The AWS-EKS Module definition
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.0"
cluster_name = "my-test-cluster"
cluster_version = "1.21"
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
eks_managed_node_group_defaults = {
disk_size = 50
instance_types = ["m5.large"]
node_security_group_additional_rules = merge(
eks_managed_node_groups = {
green_test = {
min_size = 1
max_size = 2
desired_size = 2
instance_types = ["t3.large"]
capacity_type = "SPOT"
# Port needed to solve the error
# Internal error occurred: failed calling
# webhook "namespace.sidecar-injector.istio.io": failed to
# call webhook: Post "https://istiod.istio-system.svc:443/inject?timeout=10s": # context deadline exceeded
resource "aws_security_group_rule" "allow_sidecar_injection" {
description = "Webhook container port, From Control Plane"
protocol = "tcp"
type = "ingress"
from_port = 15017
to_port = 15017
security_group_id = module.eks.node_security_group_id
source_security_group_id = module.eks.cluster_primary_security_group_id
Please excuse my possibly terrible Terraform syntax usage. Happy Kuberneteing!