Search code examples
amazon-web-servicesaws-security-groupaws-network-acl

AWS Network ACS causes SQL Server to disconnect


I have two EC2 serers in the same VPC. Server "W" hosts a web server and a TCP connections server. Server "S" hosts a web server and a SQL Server. Both web servers and the TCP connections server uses the same SQL Server. The servers on server "W" accesses the SQL Server on server "S" via its priviate IP and port 1433:

enter image description here

Server W's security group's inbound rules:

enter image description here

Server S's security group's inbound rules:

enter image description here

It has an inbound rule allowing port 1433 from the security group used by server "W".

The common network ACS's inbound rules:

enter image description here

My intension is to hide the SQL Server within the VPC and do not expose it to outside world.

What is strange and I can't understand, is that on the network ACL I have to allow inbound connection on port 1024-65535. If I change it to the proper ephemeral port range 32768-65535, both the web server and the TCP connections on server "W" can no longer access the SQL Server, and the following exception is thrown in server "W":

SqlException - A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) innerException Win32Exception - The network path was not found.

So it seems as if the SQL Server needs to allow an inbound connection from outside of the VPC, on a port between 1024-32768. But no one outside of the VPC needs to access the SQL Server. This is what puzzles me.


Solution

  • Network ACLs apply at subnet level. When a NACL is attached to a subnet, all internal/external traffic into and out of that subnet must be allowed through a rule. Security Groups apply at instance level, so essentially to inbound traffic that have already passed through the NACL (for outbound traffic it's in reverse, so SG before NACL).

    In this case when server W tries to connect to server S on port 1433, that traffic encounters the deny all rule on your NACL since there are no other rules before it that match. It doesn't even reach the security group. When you add an allow rule for port 1024-65535, obviously this includes port 1433 so it works. You can confirm this behavior using the VPC Reachability Analyzer tool.

    So the fix here is basically to have an allow rule for port 1433 on the NACL attached to your Server S subnet. This doesn't mean automatically allowing traffic from outside of the VPC - if this is a private subnet (ie. without an internet gateway) external traffic can't reach your instance plus your security group already restricts access.