Search code examples
amazon-web-servicesamazon-ec2containersamazon-ecs

AWS Trusted Advisor and ephemeral ports


I get "Action recommended" (Red !) on running AWS Trusted Advisor when I open ephemeral ports (1024-65535) in Security Group to allow communication between ALB and EC2 Container service. Is it something I should be worried about or not to trust AWS Trusted Advisor?


Solution

  • Original Answer

    Security groups are stateful, meaning that traffic initiated from the instance to another source will have all return traffic related to that outbound request (ie. ephemeral ports) allowed. It's really the NACL in VPCs where you have to actually allow ephemeral traffic as it's not stateful and doesn't understand return traffic like security groups do.

    That said for ALB -> instance traffic you won't need to open those ports in the sec group because the sec group will allow traffic initiated from within the ALB (to the instance) and related ephemeral port traffic coming back to the ALB.

    Your instances will simply need whatever port that's being checked (port 80/8080/etc.) since it's traffic coming from the outside. However it doesn't need anything for allowing traffic outbound to the ALB ephemeral ports since those are initiated from inside the instance as well as being attached to the incoming port allowed traffic.

    Edit:

    After a lot of working around with an EC2 instance to try and explain this I found a few faults in the original explanation. I'll leave the original explanation here as I think it's important to know mistakes happen.

    At any rate, let's go for the more in depth answer here.

    NACL (Network Access Control Lists)

    These are stateless firewalls. Basically it has no idea that the outgoing ephemeral port traffic is related to the incoming HTTP traffic. It's also a priority type system. Basically you number your rules in the order you want them to be evaluated by, lowest to highest. The moment it hits a rule that matches the traffic it applies it. You can also explicitly deny traffic.

    The main disadvantage here is that NACL only allows 20 rules each way (for a total of 40 rules) whereas security groups allow you 50 rules each way (for a total of 100 rules). That said, if you start to run out of security group rules for whatever reason it's always possible to take common traffic rules and apply them to the NACL. NACLs would also be something to consider in high compliance environments where you absolutely must block certain traffic as explicit DENY rules are possible versus Security Groups which are exclusively permissive rules.

    Security Groups

    Security groups, unlike NACL can only have permissive effect rules. DENY is simply the lack of a permissive role. However, under certain circumstances explained below security groups will track traffic and automatically add a rule for permitting traffic in the other direction.

    Security groups by default have a rule that allows all outbound traffic. The idea here is that if it's initiated from your instance a good majority of the use cases it's okay. Now if a hacker theoretically gets access to the system through a service exploit then they would now have the ability to have outbound traffic pretty much wherever they want.

    What you could do here is remove the outbound traffic rule in your security group. In this case you would have the following:

    • Traffic originating from the instance would be denied
    • If an incoming rule was accepted, outbound traffic would be allowed regardless of the lack of outbound rules
    • If an outbound rule was added (say port 80) than a call out from the instance to an external server on port 80 would be allowed. Traffic related to that port that was incoming would also be allowed.

    Security Groups also track connections (which is why they are called stateful) to allow traffic from the other direction related automatically. However it only tracks this if traffic would otherwise be denied.

    For example if you didn't remove the outbound rule that allows all access, the security group would have no need to be stateful as there's no need to add rules. It does however need to be stateful when the traffic would otherwise not be allowed. There's no real solid documentation I can find on how it does it, but I theorize that it's around the three way TCP handshake. Essential it starts allowing traffic in the other direction when a SYN comes in or goes out to an allowed port. Then it fully tracks when the rest of the handshake (SYN+ACK -> ACK) is completed. When connection close related packets come then it potentially removes the tracking.

    With this in mind it's best that you be more permissive with outgoing traffic if possible when dealing with high capacity front facing services, as I can imagine the tracking starting to slow things down to a noticeable speed.

    Recommendations

    • Kill the NACL rules and just allow all traffic in and out. Let the stateful security groups handle things for you.
    • Put the instances behind the ALB in private subnets. That will block outside traffic since there will be no route.
    • However you'll want a NAT Gateway that lets your private instances reach out to the internet for important things like getting package updates from distro servers.
    • Security group for backend instances: allow whatever port the ELB expects inbound traffic. Allow all outbound traffic.
    • Security group for ALB: Allow inbound traffic for whatever port (80 or 443 I would assume) and allow all outbound traffic.
    • Create what's called a bastion instance. It's simply an EC2 instance that only allows SSH (or RDP for windows instances). You use this as your gateway to login to private subnet instances. This should allow all outbound traffic in the security group, and allow SSH traffic inward only to your IPs that should be authorized to access it. This is very important because if you don't restrict IPs random bots scanning the Amazon public IP space (usually from China or Russia which have a huge IP space) and randomly trying to connect to port 22. You just don't want to deal with that especially since the possibility of a remote login exploit is always greater than 0%.