Search code examples
amazon-web-servicesrestaws-lambdaamazon-iamaws-security-group

How to properly set AWS inbound rules to accept response from external REST API call


My use case
I have an AWS lambda hosted function that calls an external API. In my case it is Trello's terrific and well-defined API.

My problem in a nutshell - TL;DR Option: Feel Free to Jump to Statement Below
I had my external API call to Trello working properly. Now it is not working. I suspect I changed networking permissions within AWS that now block the returned response from the service provider. Details to follow.

My testing
I have tested my call to the API using Postman, so I know I have a well-formed request and a useful returned response from the service provider. The business logic is OK. For reference, here is the API call I am using. I have obfuscated my key and token for obvious reasons:

https://api.trello.com/1/cards?key=<myKey>&token=<myToken&idList=<a_real_list_here>&name=New+cards+are+cool

This should put a new card on my Trello board, and in POSTMAN (running on my local machine) it does so successfully. In fact, I had this working in an AWS lambda function I recently deployed. Here is the call. (Note that I'm using the recommended urllib3 library recommended by AWS:

    http.request("POST", "https://api.trello.com/1/cards?key=<myKey>&token=<myToken>&idList=<a_real_list_here>&name="+card_name+"&desc="+card_description)

Furthermore, I have tested the same capability a CURL version of that same request. It is formed like this:

    curl --location --request POST 'https://api.trello.com/1/cards?key=338d5b193d43e95712005fd2bcb4cd12&token=d0e3c4cd6281f43e4ec257ae5f05cd902230cbbca7e26b99664cd620f6479f7a&idList=600213811e171376755c7ed5&name=New+cards+are+cool'

I can summarize the behavior like this

+------------+---------------+----------------------+---------------+
|            | Local Machine | Previously on Lambda | Now on Lambda |
+------------+---------------+----------------------+---------------+
| cURL       |     GOOD      |         GOOD         |      N/A      |
+------------+---------------+----------------------+---------------+
| HTTP POST  |     GOOD      |         GOOD         |    443 Error  |
+------------+---------------+----------------------+---------------+

Code and Errors
I am not getting a debuggable response. I get a 443, which I presume is the error code, but even that is not clear. Here is the code snippet:

#send to trello board 
try: 
    http.request("<post string from above>") 
except: 
    logger.debug("<post string from above>")

The code never seems to get to the logger.debug() call. I get this in the AWS log:

[DEBUG] 2021-01-19T21:56:24.757Z 729be341-d2f7-4dc3-9491-42bc3c5d6ebf 
Starting new HTTPS connection (1): api.trello.com:443 

I presume the "Starting New HTTPS connection..." log entry is coming fromurllib3 libraries

PROBLEM SUMMARY
I know from testing that my actual API call to the external service is properly formed. At one point it was working well, but now it is not. Previously, in order to get it to work well, I had to fiddle with AWS permissions to allow the response to come back from the service provider. I did it, but I didn't fully understand what I did and I think I was just lucky. Now it's broken and I want to do it in a thoughtful way.

What I'm looking for is an understanding of how to set up the AWS permission structure to enable that return message from the service provider. AWS provides a comprehensive guide to how to use the API Gateway to give others access to services hosted on AWS, but it's much more sketchy about how to open permissions for responses from other service providers.

Thanks to the folks at Hava, I have this terrific diagram of the permissions in place for my AWS infrastructure: Security Structure The two nets marked in red are unrelated to this project. The first green check points to one of my EC2 machines and the second points to a related security group.

I'm hoping the community can help me to understand what the key permission elements (IAM roles, security groups, etc) are in play and what I need to look for in the related AWS permissions/networking/security structure.


Solution

  • So, the problem in the end was none of the networking problems. In fact, the problem was the lambda function did not have the right Execution Role assigned.

    SPECIFICALLY Lambda needs AWSLambdaVPCAccessExecutionRole in order to call all of the basic VPC stuff to get to all the fancy networking infrastructure gymnastics shown above.

    This is an AWS managed role and the default AWS description of this role is Allows Lambda functions to call AWS services on your behalf.

    If you are having this problem, here is how to check this out.

    1. Go to your lambda function [Services][Lambda][Functions] and then click on your function
    2. Go to the configuration tab. At the right side of the window, select Edit.
    3. If you were like me, you already had a Role but it may have been the wrong one. If you change the role, the console will take a while to reset the settings even before you hit Save this is normal.
    4. At the very bottom of the page, right below the role selection, you'll see a link to the role in the IAM control panel. Click on that to check your IAM Policies
    5. Make sure that AWSLambdaVPCAccessExecutionRole is among the polcies enabled.

    Red Herrings
    Here are two things that initially led me astray:

    1. I keep seeing 443 come back as what I thought was an error code from the urllib3 service call. It was not. I tried a few other things and my best guess is that it was a port number, not an error.

    2. The lack of access was certainly a networking configuration error, until I tried an experiment that proved to me that it was not. Here is the proposed experiment:

    If you follow all of the guidance you will have the following network setup:

    • One public subnet connected to the internet gateway
    • One private subnet connected all of your internal organs
    • One NAT gateway that points your private subnet to the IGW
    • A routing table that connects your private subnet to the NAT gateway
    • A routing table that connects your public subnet to the IGW

    THEN, with all of that set up, create a throw-away EC2 instance in your private subnet. When you set it up, it should not have a public IP. You can double check that by trying to use the CONNECT function on the EC2 pages. It should not work.

    If you don't already have it, set up an EC2 in your public subnet. You should have a public IP for that one.

    Now SSH into your public EC2. Once in there, SSH from your public EC2 to your private EC2. If all of your infrastructure is set up correctly, you should be able to do this. If you're logged into your private EC2, you should be able to ping a public web site from inside the EC2 running in that private subnet.

    The fact that you could not directly connect to your private EC2 tells you that the subnet is secure-ish. The fact that you could reach the internet from that private EC2 tells you that the NAT gateway and routing tables are set up correctly.

    But of course none of this matters if your Execution Role is not right.

    One Last Thought
    I'm absolutely no expert on this, but invite the experts to correct me here. I'll edit it as I learn. For those new to these concepts I deeply recommend taking the time to map out your network with a piece of paper. When I have enough credibility, I'll post the map I did to help me to think through all of this.

    Paul