Search code examples
kubernetesnginxkubernetes-helmamazon-eksingress-nginx

Upstream Ingress Routing Issue with ingress-nginx Helm Chart via Terraform with Cert-Manager


So, we have an issue whereby which we have a Kubernetes Cluster that we build out in terraform. After we build the cluster, we have helm charts which deploy the essential infrastructure for us. We have been able to automate the TLS with cert-manager, but have been unable to successfully set-up routing with the Nginx LoadBalancer for our main ingress. In other words, we have been able to successfully secure our root ingress, which a service that hosts a WebUI at the root level, but downstream services for some odd reason are not getting the proper routing from our root ingress. As a result, I am turning to the community to see if they can help us with the overall configuration needed for our helm chart ingress-nginx:

Here is the terraform tree:

.
├── README.md
├── data.tf
├── helm.tf
├── locals.tf
├── main.tf
├── outputs.tf
├── providers.tf
└── values
    ├── cert-manager.values.yaml
    ├── cluster-autoscaler.values.yaml
    ├── dapr.values.yaml
    ├── external-dns.values.yaml
    ├── falcosecurity.values.yaml
    ├── ingress-nginx.values.yaml
    ├── k8s-aws-ebs-tagger.values.yaml
    ├── oauth2-proxy.values.yaml
    └── postgresql.values.yaml

Some of our values will not work for you, but the essential ones that deploy the ingress-nginx are as follows:

helm.tf:

#############################
#       HELM CHARTS         #
#############################
resource "helm_release" "ingress-nginx" {
  depends_on = [module.primary, module.cert_manager]
  name       = "ingress-nginx"
  repository = "https://kubernetes.github.io/ingress-nginx"
  chart      = "ingress-nginx"
  version    = "4.7.1"
  namespace  = "ingress"
  lint       = true
  timeout    = "600"
  values     = [file("./values/ingress-nginx.values.yaml")]

  reuse_values     = true
  force_update     = true
  recreate_pods    = true
  cleanup_on_fail  = true
  create_namespace = true

  set {
    name  = "cert-manager.io/cluster-issuer"
    value = module.cert_manager.cluster_issuer_name
  }

}

And our values file that we are ingesting, ingress-nginx.values.yaml:

controller:
  publishService:
    enabled: true
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
  config:
    proxy-body-size: 50m
    ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA"
    ssl-protocols: "TLSv1 TLSv1.1 TLSv1.2 TLSv1.3"

It does deploy Nginx, but any downstream services are not routable and we do not understand why. As a result, we are trying to deploy it so that it is routable. Right now, things work if we put the service on one node, but if have any additional nodes, routing does not occur from Nginx. Are we missing any configuration parameters?

Here is the helm chart we are using:

Providers we are relying on:

<!-- BEGIN_TF_DOCS -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.3.7 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 4.12.0 |
| <a name="requirement_helm"></a> [helm](#requirement\_helm) | 2.5.1 |
| <a name="requirement_kubectl"></a> [kubectl](#requirement\_kubectl) | 1.13.0 |
| <a name="requirement_kubernetes"></a> [kubernetes](#requirement\_kubernetes) | 2.11.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | 5.13.1 |
| <a name="provider_helm"></a> [helm](#provider\_helm) | 2.5.1 |
| <a name="provider_terraform"></a> [terraform](#provider\_terraform) | n/a |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_cert_manager"></a> [cert\_manager](#module\_cert\_manager) | terraform-iaac/cert-manager/kubernetes | ~> 2.5.1 |
| <a name="module_primary"></a> [primary](#module\_primary) | terraform-aws-modules/eks/aws | ~> 19.16.0 |

## Resources

| Name | Type |
|------|------|
| [aws_iam_role_policy_attachment.primary_node_group_AmazonEBSCSIDriverPolicy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_kms_alias.cluster_key](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_alias) | resource |
| [aws_kms_key.cluster_key](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_key) | resource |
| [helm_release.aws-ebs-csi-driver](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.cluster-autoscaler](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.dapr](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.datadog](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.externaldns](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.falcosecurity](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.harness-delegate-ng](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.metrics-server](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.oauth2-proxy](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.postgresql](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.secrets-store-csi-driver](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [helm_release.vertical-pod-autoscaler](https://registry.terraform.io/providers/hashicorp/helm/2.5.1/docs/resources/release) | resource |
| [aws_eks_cluster_auth.primary](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |
| [aws_secretsmanager_secret_version.datadog_api_key](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret_version) | data source |
| [aws_secretsmanager_secret_version.harness_delegate_token](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret_version) | data source |
| [aws_secretsmanager_secret_version.oauth_proxy_client_id](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret_version) | data source |
| [aws_secretsmanager_secret_version.oauth_proxy_client_secret](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret_version) | data source |
| [aws_secretsmanager_secret_version.oauth_proxy_cookie_secret](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret_version) | data source |
| [aws_secretsmanager_secret_version.postgresql_password](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret_version) | data source |
| [aws_vpc.shared](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc) | data source |
| [terraform_remote_state.subnets](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |

## Inputs

No inputs.

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_aws_auth_configmap_yaml"></a> [aws\_auth\_configmap\_yaml](#output\_aws\_auth\_configmap\_yaml) | n/a |
| <a name="output_cluster_arn"></a> [cluster\_arn](#output\_cluster\_arn) | The Kubernetes Cluster Arn |
| <a name="output_cluster_certificate_authority_data"></a> [cluster\_certificate\_authority\_data](#output\_cluster\_certificate\_authority\_data) | The Kubernetes Cluster Certificate Authority |
| <a name="output_cluster_endpoint"></a> [cluster\_endpoint](#output\_cluster\_endpoint) | The Kubernetes cluster host endpoint |
| <a name="output_cluster_id"></a> [cluster\_id](#output\_cluster\_id) | The Kuberntes ClusterID |
| <a name="output_cluster_name"></a> [cluster\_name](#output\_cluster\_name) | The Kubernetes Cluster Name |
| <a name="output_cluster_oidc_issuer_url"></a> [cluster\_oidc\_issuer\_url](#output\_cluster\_oidc\_issuer\_url) | The URL on the EKS cluster for the OpenID Connect identity provider |
| <a name="output_cluster_platform_version"></a> [cluster\_platform\_version](#output\_cluster\_platform\_version) | Platform version for the cluster |
| <a name="output_cluster_status"></a> [cluster\_status](#output\_cluster\_status) | Status of the EKS cluster. One of `CREATING`, `ACTIVE`, `DELETING`, `FAILED` |
<!-- END_TF_DOCS -->

The kubernetes logs are as follows:

kubectl logs -f ingress-nginx-controller-758f8cbd4d-5cwvq -n ingress-nginx
2020/05/21 02:34:00 [error] 330#330: *10136 upstream timed out (110: Operation timed out) while connecting to upstream, client: 192.168.1.71, server: myip.qql.com, request: "GET / HTTP/1.1", upstream: "http://10.122.69.209:8080/", host: "myip.qql.com"
2020/05/21 02:34:05 [error] 330#330: *10136 upstream timed out (110: Operation timed out) while connecting to upstream, client: 192.168.1.71, server: myip.qql.com, request: "GET / HTTP/1.1", upstream: "http://10.122.69.209:8080/", host: "myip.qql.com"
2020/05/21 02:34:10 [error] 330#330: *10136 upstream timed out (110: Operation timed out) while connecting to upstream, client: 192.168.1.71, server: myip.qql.com, request: "GET / HTTP/1.1", upstream: "http://10.122.69.209:8080/", host: "myip.qql.com"
192.168.1.71 - - [21/May/2020:02:34:10 +0000] "GET / HTTP/1.1" 504 168 "-" "curl/7.29.0" 76 15.001 [default-myip-8080] [] 10.122.69.209:8080, 10.122.69.209:8080, 10.122.69.209:8080 0, 0, 0 5.000, 5.001, 5.001 504, 504, 504 37af0db14d310d324bfc5e9919fbe7e4

This is what happens when we try to access the service that is downstream from the root ingress. I know it's something with Nginx, I just don't know where to start, we have tried playing with multiple configurations to the values file, but to no avail.


Solution

  • So we found out it was a security group rule on intern-node communications. We had to manually open them up within the AWS Console, we then added it later to our main terraform that spins up the EKS cluster that uses the terraform-aws-modules/eks module.