Search code examples
amazon-web-servicesterraformgithub-actionsamazon-eksterraform-aws-modules

Terraform Destroy on EKS Fails within GitHub. Actions Workflow


Terraform Destroy fails within our workflow:

GitHub Integration Action/Workflow:

name: 'integration'
on:
  push:
    branches:
      - '**'
      - '!main'
  workflow_dispatch:
permissions:
  id-token: write
  contents: read
  deployments: write
jobs:
  integration:
    runs-on: ubuntu-latest
    concurrency:
      group: canary
      cancel-in-progress: false
    defaults:
      run:
        working-directory: examples/complete/
    steps:
      - name: 'Checkout'
        uses: actions/[email protected]
      - name: 'Extract branch name'
        id: extract_branch
        shell: bash
        run: echo "##[set-output name=branch;]$(echo ${{ github.ref_name }})"
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::702906146300:role/terraform-aws-eks-primary
          aws-region: us-west-2
      - name: 'Setup Terraform'
        uses: hashicorp/[email protected]
        with:
          terraform_version: 1.3.7
      - name: 'Terraform Init'
        id: init
        run: terraform init -force-copy -input=false
      - name: 'Terraform Validate'
        id: validate
        run: terraform validate -no-color
      - name: 'Terraform Plan'
        id: plan
        run: terraform plan -var="create_cni_ipv6_iam_policy=true" -var="iam_role_attach_cni_policy=true"  -no-color -input=false
      - name: 'Start deployment'
        uses: bobheadxi/[email protected]
        id: deployment
        with:
          step: start
          token: ${{ secrets.GITHUB_TOKEN }}
          env: canary
      - name: 'Terraform Apply'
        id: apply
        run: |
          terraform apply -var="create_cni_ipv6_iam_policy=true" -var="iam_role_attach_cni_policy=true"  -no-color -input=false -auto-approve
          terraform apply -no-color -input=false -auto-approve
      - name: 'Terraform Destroy'
        id: destroy
        if: always()
        run: terraform destroy  -no-color -input=false -auto-approve
      - name: 'Finish deployment'
        uses: bobheadxi/[email protected]
        if: always()
        with:
          step: finish
          token: ${{ secrets.GITHUB_TOKEN }}
          status: ${{ job.status }}
          env: ${{ steps.deployment.outputs.env }}
          env_url: https://github.com/${{ github.repository }}/actions?query=workflow%3A${{ github.workflow }}+branch%3A${{ steps.extract_branch.outputs.branch }}
          deployment_id: ${{ steps.deployment.outputs.deployment_id }}

The command it fails on is:

terraform destroy  -no-color -input=false -auto-approve

I am specifically using a module to spin up EKS on AWS with terraform.

Module: https://github.com/terraform-aws-modules/terraform-aws-eks

I have tried using multiple versions, but have had very limited success. I don't think it's anything to do with the terraform, but more the module or the terraform command I am using to destroy the infrastructure. As a result, the EKS Cluster does eventually get destroyed, but since I am allowing the module to manage our security groups for the cluster, it seems like ti fails to actually delete the security groups due to a dependency with the way EKS is spun up with the virtual nodes and VPC Access via Cluster and Node Security Groups.

The error:

Error: deleting Security Group (sg-0b11ee4a81d0092b2): DependencyViolation: resource sg-0b11ee4a81d0092b2 has a dependent object
    status code: 400, request id: 8a8dfd26-5198-4bbd-9f0b-84131c248434

main.tf:

################################################
#          KMS CLUSTER ENCRYPTION KEY          #
################################################
resource "aws_kms_key" "this" {
  description             = "EKS Cluster Encyrption Key"
  deletion_window_in_days = 7
  enable_key_rotation     = true
}

resource "aws_kms_alias" "this" {
  name          = "alias/eks_cluster_encryption_key"
  target_key_id = aws_kms_key.this.key_id
}

##################################
#       KUBERNETES CLUSTER       #
##################################
module "primary" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.4.3"

  cluster_name                    = var.cluster_name
  cluster_version                 = var.cluster_version
  cluster_endpoint_private_access = var.cluster_endpoint_private_access
  cluster_endpoint_public_access  = var.cluster_endpoint_public_access

  create_cloudwatch_log_group = false

  create_kms_key = false
  cluster_encryption_config = {
    resources        = ["secrets"]
    provider_key_arn = aws_kms_key.this.arn
  }

  create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
  manage_aws_auth_configmap  = true
  aws_auth_roles             = var.aws_auth_roles

  vpc_id     = var.vpc_id
  subnet_ids = var.subnet_ids

  eks_managed_node_group_defaults = {
    ami_type       = var.ami_type
    disk_size      = var.disk_size
    instance_types = var.instance_types

    iam_role_attach_cni_policy = var.iam_role_attach_cni_policy
  }

  eks_managed_node_groups = {
    primary = {
      min_size     = 1
      max_size     = 5
      desired_size = 1

      capacity_type = "ON_DEMAND"
    }
    secondary = {
      min_size     = 1
      max_size     = 5
      desired_size = 1

      capacity_type = "SPOT"
    }
  }

  cluster_addons = {
    coredns = {
      most_recent = true

      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    kube-proxy = {
      most_recent                 = true
      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    aws-ebs-csi-driver = {
      most_recent                 = true
      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    vpc-cni = {
      most_recent                 = true
      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
  }

  tags = {
    repo  = "https://github.com/impinj-di/terraform-aws-eks-primary"
    team  = "di"
    owner = "[email protected]"
  }
}

####################################
#       KUBERNETES RESOURCES       #
####################################
resource "kubernetes_namespace" "this" {
  depends_on = [module.primary]
  for_each   = toset(local.eks_namespaces)
  metadata {
    name = each.key
  }
}

As you can see I am not specifying the Cluster and/or Node-to-Node Security groups, in other words we are wanting defaults.

Should I use the following within my workflow to destroy the infrastructure, or will this make a difference?

terraform destroy -force -no-color -input=false -auto-approve

Solution

  • Most probably there is a ENI attached to that security group, preventing it from deletion with that dependency error.

    Your solution is to figure out the problematic ENI, either from

    • UI by trying to delete sg manually to check the error
    • CLI by running aws ec2 describe-network-interfaces --filter Name=group-id,Values=sg-0b11ee4a81d0092b2 --query NetworkInterfaces[*].NetworkInterfaceId

    and manually delete it, and re-run the destroy command again, but that is only a temp one-time solution for this error.

    And there is no possible simple terraform solution, since you didn't write the terraform code, instead you are using a ready-to-use module, which seems to be prone to the error you are reporting because of some bad dependency management in the module terraform-aws-eks git repo