Search code examples
goamazon-vpcconnection-timeoutaws-stsvpc-endpoint

Request times out when try to assume a role with AWS sts from a private subnet using a VPC Endpoint


When I'm calling AWS sts to assume a role in a lambda function running in a private subnet on a VPC with an Endpoint configured for STS. However, my request times out.

My setup is as follows:

  • I run a lambda attached to a private subnet and security group in a VPC
  • Because the subnet is private, I've configured a VPC Endpoint to access STS on com.amazonaws.eu-west-1.sts
  • My lambda is written in golang using the older sdk-for-go v1 api: https://docs.aws.amazon.com/sdk-for-go/api/
  • I've also configered a VPC Endpoint to access S3 which works without problems

My terraform configuration for the VPC endpoint is:

resource "aws_vpc_endpoint" "xxxx-sts" {
  vpc_id = aws_vpc.xxxx.id
  service_name = "com.amazonaws.eu-west-1.sts"
  vpc_endpoint_type = "Interface"
  security_group_ids = [aws_security_group.xxxx.id]
  subnet_ids = [aws_subnet.xxxx.id]
  private_dns_enabled = true
}

Solution

  • To fix this problem, add the following ENV key/value to your lambda or application environment:

    export AWS_STS_REGIONAL_ENDPOINTS='regional'
    

    This forces the AWS SDK to use regional rather than global endpoints when calling STS as documented here: https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

    What happens otherwise is that the Go SDK will default to using the global sts endpoint https://sts.amazonaws.com for regions such as eu-west-1 (This happens in the following regions: ap-northeast-1, ap-south-1, ap-southeast-1, ap-southeast-2, aws-global, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, and us-west-2)

    The STS VPC Endpoint is configured only for regional URLs and so when the program tries to access a global URL in a private subnet, a connection can't be established and times out instead.