Search code examples
amazon-web-servicesaws-lambdaamazon-rdsaws-cdkamazon-vpc

Lambda Function connection to RDS timing out (*Private Isolated Subnets only no NAT Gateway or IGW)


Does anyone know if it's possible to connect Lambda to RDS in the same VPC with just isolated subnets where both resources are in the same subnets? I've fiddled and searched far and wide for a few days now. Even GPT can't get it. This is more of a "can it actually work this way" rather than a set way on doing it.

I'm trying to avoid NAT Gateway to get around the .045 cents per hour cost and just use the RDS instance free-tier for development. Adding the NAT Gateway with public subnets and having private with egress subnets works.. But I'd like to know if this is possible with entirely isolated subnets with no NAT gateway.

Here's where I'm at:

  • Route table associated with all subnets and has destination of the VPC CIDR with target local
  • Set NACLs to allow all inbound and outbound traffic (also tried specific subnet CIDR).
  • VPC and subnets properly attached to Lambda.
  • RDS is not public
  • Same VPC and security group with subnet group attached to RDS instance. (yes I know this isn't ideal for prod.) I tried having security group reference itself, and tried security group allowing all traffic, and even had RDS connect to Lambda via the counsel where it actually attaches the security groups.
  • lambda layer is not the issue and timeout setting is not too short.

Here's what the CDK looks like for the VPC and security groups. You can see the create lambda function attaches them accordingly.

Any help is much appreciated. I'm driving myself a little crazy trying to get it to connect.

# Create VPC non NAT Gateways & only private isolated subnets
vpc = ec2.Vpc(
    self,
    "VPCNoNat",
    cidr="10.0.0.0/24",
    nat_gateways=0,  # No NAT gateways
    subnet_configuration=[
        ec2.SubnetConfiguration(
            name="PrivateSubnetNoNat",
            subnet_type=ec2.SubnetType.PRIVATE_ISOLATED,
            cidr_mask=26,
        ),
    ],
)

# Create a Security Group for the RDS Cluster and Lambda
security_group = ec2.SecurityGroup(
    self,
    "cdk-securitygroupNoNat",
    vpc=vpc,
    security_group_name="cdk-security-group-no-nat",
)

# Allow traffic on port 5432 from within the VPC for RDS
security_group.add_ingress_rule(ec2.Peer.ipv4(vpc.vpc_cidr_block), ec2.Port.tcp(5432))

# Create a subnet group
subnet_group = rds.SubnetGroup(
    self,
    "cdkSubnetGroupNoNat",
    description="Subnet group for cdk no nat",
    vpc=vpc,
    vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED),
)

db_instance = rds.DatabaseInstance(
    self,
    "PostgresInstance1NoNat",
    engine=rds.DatabaseInstanceEngine.POSTGRES,
    instance_type=ec2.InstanceType.of(
        ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.MICRO
    ),
    credentials=rds.Credentials.from_generated_secret("postgres"),
    vpc=vpc,
    security_groups=[security_group],
    subnet_group=subnet_group,
    publicly_accessible=False,
)


# Create a Lambda function
def create_lambda_function(
    stack,
    lambda_id,
    handler,
    layers,
    vpc,
    security_group,
    role,
    runtime=_lambda.Runtime.PYTHON_3_9,
    timeout=Duration.seconds(15),
    extra_env_vars=None,
):
    # Default environment variables
    default_env_vars = {
        "SECRET_ARN": db_instance.secret.secret_arn,
        "DB_HOST": db_instance.db_instance_endpoint_address,
    }

    # Merge extra environment variables with the default ones
    environment_vars = {**default_env_vars, **(extra_env_vars or {})}

    return _lambda.Function(
        stack,
        lambda_id,
        handler=handler,
        runtime=runtime,
        environment=environment_vars,
        code=_lambda.Code.from_asset(handler),
        role=role,
        layers=layers,
        vpc=vpc,
        security_groups=[security_group],
        vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED),
        timeout=timeout,
    )

Solution

  • UPDATE: Given the additional detail in the comment, the issue is not with database connectivity, it is due to attempting to call an AWS API (SecretsManager) without any internet access. To solve this, create a VPC Endpoint for secrets manager:

    endpoint = ec2.InterfaceVpcEndpoint(
        self,
        "SMEndpoint",
        service=ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
        vpc=vpc,
    )
    
    # optional; makes sure the lambda is provisioned once connectivity is in place
    myLambda.node.addDependency(endpoint)
    

    Reference: https://docs.aws.amazon.com/lambda/latest/dg/foundation-networking.html

    Original answer providing general CDK guidance, although this is not the cause of the issue:

    I would strongly suggest not dealing with Security Groups directly at all.

    From the CDK docs on SecurityGroup:

    Direct manipulation of the Security Group through addIngressRule and addEgressRule is possible, but mutation through the .connections object is recommended. If you peer two constructs with security groups this way, appropriate rules will be created in both.

    It would simplify your code quite a bit:

    vpc = ec2.Vpc(
        self,
        "VPCNoNat",
        cidr="10.0.0.0/24",
        nat_gateways=0,  # No NAT gateways
        subnet_configuration=ec2.Vpc.DEFAULT_SUBNETS_NO_NAT,
    )
    
    db_instance = rds.DatabaseInstance(
        self,
        "PostgresInstance1NoNat",
        engine=rds.DatabaseInstanceEngine.POSTGRES,
        instance_type=ec2.InstanceType.of(
            ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.MICRO
        ),
        credentials=rds.Credentials.from_generated_secret("postgres"),
        vpc=vpc,
        publicly_accessible=False,
    )
    function = _lambda.Function(
        self,
        "MyFunction",
        handler=handler,
        runtime=runtime,
        environment=environment_vars,
        code=_lambda.Code.from_asset(handler),
        layers=layers,
        vpc=vpc,
    )
    
    function.connections.allow_to_default_port(db_instance)
    

    Connections.allow_to_default_port will take care of the security group rules. You should not have to create the security groups or subnet groups yourself.

    That said, if you are making any AWS API calls from within your function, you will need to create a VPC endpoint for each service you're using in the VPC.