Search code examples
amazon-web-servicesterraformaws-api-gatewayaws-application-load-balancer

API Gateway V2 private integration to application load balancer always returns 503


I have an API Gateway V2 connected to an Application Load Balancer via a VPC Link defined in Terraform like so:

resource "aws_apigatewayv2_vpc_link" "alb_connection" {
  name               = "${local.full_name}-vpc-link"
  security_group_ids = [aws_security_group.alb_sg.id]
  subnet_ids         = data.aws_subnets.web.ids
  tags               = module.network_label.tags
}

resource "aws_cognito_resource_server" "alb_connection" {
  identifier = "${var.environment}.${var.area}.${var.application}"
  name       = "${var.environment}-${local.full_name}-rs"

  dynamic "scope" {
    for_each = var.scopes

    content {
      scope_name        = scope.value
      scope_description = "Allow ${scope.value} access to ${local.full_name}/${var.area}"
    }
  }

  user_pool_id = one(data.aws_cognito_user_pools.selected.ids)
}

resource "aws_apigatewayv2_authorizer" "alb_connection" {
  name             = "http-api-request-authorizer"
  api_id           = data.aws_apigatewayv2_api.gateway.id
  authorizer_type  = "REQUEST"
  identity_sources = ["$request.header.Authorization"]
  authorizer_uri   = data.aws_lambda_function.authorizer.invoke_arn

  enable_simple_responses           = true
  authorizer_payload_format_version = "2.0"
}

resource "aws_apigatewayv2_route" "alb_connection" {
  api_id    = data.aws_apigatewayv2_api.gateway.id
  route_key = "ANY /${var.area}/{proxy+}"
  target    = "integrations/${aws_apigatewayv2_integration.alb_connection.id}"

  authorizer_id      = aws_apigatewayv2_authorizer.alb_connection.id
  authorization_type = "CUSTOM"
  # authorization_scopes = ["${aws_cognito_resource_server.alb_connection.identifier}/${each.value}"]
}

resource "aws_apigatewayv2_integration" "alb_connection" {
  api_id                 = data.aws_apigatewayv2_api.gateway.id
  description            = "Integration connecting ${var.api_gateway_name} to ${local.full_name}"
  integration_type       = "HTTP_PROXY"
  integration_uri        = aws_lb_listener.this.arn
  payload_format_version = "1.0" # Required for HTTP_PROXY integrations (private integrations)

  integration_method = "ANY"
  connection_type    = "VPC_LINK"
  connection_id      = aws_apigatewayv2_vpc_link.alb_connection.id

  request_parameters = {
    "append:header.authforintegration" = "$context.authorizer.authorizerResponse"
  }

  response_parameters {
    status_code = 403
    mappings = {
      "append:header.auth" = "$context.authorizer.authorizerResponse"
    }
  }
}

resource "aws_lb" "alb" {

  # Set the basic fields of the load balancer
  name                       = "${local.full_name}-alb-${random_string.id.result}"
  internal                   = true
  load_balancer_type         = "application"
  enable_deletion_protection = false
  security_groups            = [aws_security_group.alb_sg.id]
  subnets                    = data.aws_subnets.web.ids

  # Setup logging for access
  access_logs {
    bucket  = data.aws_s3_bucket.audit.id
    prefix  = local.full_path
    enabled = true
  }

  # Setup connection logs
  connection_logs {
    bucket  = data.aws_s3_bucket.audit.id
    prefix  = local.full_path
    enabled = true
  }

  tags = module.alb_group_label.tags

  # Ensure a new instance of this load balancer is created before the old one is destroyed to avoid downtime
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_s3_bucket_policy" "allow_log_access" {
  bucket = data.aws_s3_bucket.audit.id
  policy = data.aws_iam_policy_document.alb_access_policy.json
}

resource "aws_lb_target_group" "target_group" {
  name        = "${local.full_name}-tg-${random_string.id.result}"
  target_type = "ip"
  protocol    = "HTTP"
  port        = var.service_port
  vpc_id      = data.aws_vpc.main.id

  tags = module.alb_group_label.tags

  slow_start = var.slow_start
  health_check {
    enabled             = true
    port                = var.service_port
    interval            = 30
    timeout             = 5
    protocol            = "HTTP"
    path                = var.health_check
    matcher             = "200"
    healthy_threshold   = 3
    unhealthy_threshold = 3
  }

  # Ensure a new instance of this target group is created before the old one is destroyed to avoid downtime
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_lb_listener" "this" {
  load_balancer_arn = aws_lb.alb.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate.ssl_certificate.arn
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.target_group.arn
  }
  tags = module.alb_group_label.tags

  # Need to ensure that the listener is destroyed before the target group or we won't be able to destroy the target
  # group at all.
  lifecycle {
    create_before_destroy = true
    replace_triggered_by = [
      aws_lb_target_group.target_group
    ]
  }
}

I've been able to get deployed. When I look at my API Gateway, I can see the route, the authorizer and integration and they appear to point to each other. The VPC link appears as well. Also, my ALB target is showing green and healthy. Requests to my authorizer return the correct response. I can see API gateway logs that show as such.

However, no request I make actually gets down to the ALB itself. There are no access or connection logs stored in my S3 bucket, nor do any of the VPC flow logs indicate movement into the ALB itself. As I cannot see into the integration to understand what it's doing, I'm not sure what I'm doing wrong. Does anyone know what the problem is and how I can fix it?


Solution

  • There were a couple issues with this setup, actually. First, the ALB and VPC link were created in the same security group, which did not have a rule allowing it to send requests to itself. The solution was to separate the security groups for the VPC link and ALB and ensure that each had permissions to send requests down the chain:

    # Create the security group for the load balancer
    resource "aws_security_group" "alb_sg" {
      #checkov:skip=CKV2_AWS_5: "Ensure that Security Groups are attached to another resource"
      name        = "${local.full_name}-alb-sg"
      description = "Security group for the ${local.full_name} load balancer"
      vpc_id      = data.aws_vpc.main.id
      tags        = module.security_group_label.tags
    }
    
    # Allow HTTP inbound traffic from the VPC link to our load balancer
    resource "aws_vpc_security_group_ingress_rule" "api_gateway_ingress" {
      security_group_id = aws_security_group.alb_sg.id
      description       = "Allow the load balancer to receive HTTP requests from the API Gateway"
      from_port         = 443
      to_port           = 443
      ip_protocol       = "tcp"
      cidr_ipv4         = data.aws_vpc.main.cidr_block
      tags              = module.security_group_label.tags
    }
    
    # Allow HTTP outbound traffic from our load balancer to our service
    resource "aws_vpc_security_group_egress_rule" "allow_service_access" {
      security_group_id            = aws_security_group.alb_sg.id
      description                  = "Allow the load balancer to forward requests to the ${local.full_name} service"
      from_port                    = var.service_port
      to_port                      = var.service_port
      ip_protocol                  = "tcp"
      referenced_security_group_id = aws_security_group.service_sg.id
      tags                         = module.security_group_label.tags
    }
    
    # Create the security group for the VPC link
    resource "aws_security_group" "vpc_link_sg" {
      #checkov:skip=CKV2_AWS_5: "Ensure that Security Groups are attached to another resource"
      name        = "${local.full_name}-vpc-link-sg"
      description = "Security group for the ${local.full_name} VPC link"
      vpc_id      = data.aws_vpc.main.id
      tags        = module.security_group_label.tags
    }
    
    # Allow HTTP inbound traffic from the internet to our load balancer
    resource "aws_vpc_security_group_ingress_rule" "vpc_link_ingress" {
      security_group_id = aws_security_group.vpc_link_sg.id
      description       = "Allow the VPC link to receive HTTP requests from the API Gateway"
      from_port         = 443
      to_port           = 443
      ip_protocol       = "tcp"
      cidr_ipv4         = data.aws_vpc.main.cidr_block
      tags              = module.security_group_label.tags
    }
    
    # Allow HTTP traffic down to the load balancer from the VPC link
    resource "aws_vpc_security_group_egress_rule" "vpc_link_access" {
      security_group_id            = aws_security_group.vpc_link_sg.id
      description                  = "Allow the VPC link to forward requests to the ${local.full_name} load balancer"
      from_port                    = 443
      to_port                      = 443
      ip_protocol                  = "tcp"
      referenced_security_group_id = aws_security_group.alb_sg.id
      tags                         = module.security_group_label.tags
    }
    

    However, that alone was not enough to resolve the issue. There was still a TLS mismatch between the API Gateway integration and the ALB. To fix this, I added a block to the API Gateway integration:

    tls_config {
      server_name_to_verify = local.service_discovery_domain
    }
    

    which mapped to the subdomain of a public DNS host we had access to. Then, we added a CNAME record to the subdomain of a private DNS host with the same domain name:

    resource "aws_route53_record" "alb" {
      zone_id = data.aws_route53_zone.private_dns.zone_id
      name    = local.service_discovery_domain
      type    = "CNAME"
      ttl     = 300
      records = [aws_lb.alb.dns_name]
    }
    

    This allowed AWS to properly route requests from the API gateway to the ALB and conduct TLS negotiation successfully.