Search code examples
javaspringamazon-web-servicesamazon-linux-2aws-nat-gateway

Connection issues through AWS Nat Gateway


I have an Amazon Linux 2 application server with the Spring Boot application aboard in the private subnet. There is a Nat gateway in front of that application server in the public subnet. Application sends a request with Connection: keep-alive header to the remote host and the remote host sends a response back with the same header. So I can see an established connection via netstat.

netstat -t | grep <remote server ip>
tcp6       0      0 ip-172-30-4-31.eu:57324 <remote server ip>:http       ESTABLISHED

Because of no traffic for 350 sec Nat gateway closes connection according to this document: https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html#nat-gateway-troubleshooting-timeout But the connection is still in Established state on the application server, so the next request to the remote server gives me:

java.net.SocketException: Connection reset

I've tried to make changes at the application sever in sysctl.conf to close the connection almost simultaneously with a Nat Gateway:

net.ipv4.tcp_keepalive_time=351
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=2

But nothing happens and dumping traffic from the application server to the remote server via tcpdump gives me no keep-alive packets. So what can I do to avoid this problem except removing the Connection header in my application?


Solution

  • The problem was because of the method used to open the socket. I've used Apache Fluent API:

    Request.Post(mainProperties.getPartnerURL())
                    .addHeader("Signature", SecurityHelper.getSignature(requestBody.getBytes("UTF-8"),
                            mainProperties.getPartnerKey()))
                    .addHeader("Content-Type", "application/x-www-form-urlencoded")
                    .connectTimeout(mainProperties.getRequestTimeoutMillis())
                    .bodyByteArray(requestBody.getBytes(UTF_8))
                    .execute().returnContent().asString();
    

    But I had set so_keepalive param to the socket. It could be done using the HttpClient:

        SocketConfig socketConfig = SocketConfig.custom()
                .setSoKeepAlive(true)
                .build();
    
        RequestConfig requestConfig = RequestConfig.custom()
                .setConnectTimeout(mainProperties.getRequestTimeoutMillis())
                .build();
    
        CloseableHttpClient httpClient = HttpClientBuilder.create()
                .setDefaultSocketConfig(socketConfig)
                .setDefaultRequestConfig(requestConfig)
                .build();
                
        HttpPost post = new HttpPost(mainProperties.getPartnerURL());
    
        post.addHeader("Signature", SecurityHelper.getSignature(requestBody.getBytes("UTF-8"),
                    mainProperties.getPartnerKey()));
        post.addHeader("Content-Type", "text/xml");
        post.setEntity(new StringEntity(requestBody, UTF_8));
    
        CloseableHttpResponse response = httpClient.execute(post);
        return EntityUtils.toString(response.getEntity(), UTF_8);
    

    Then net.ipv4.tcp_keepalive_time=350 set in my sysctl.conf (sysctl -p needed to apply changes) are applied to a new connection, it could be checked like this:

    netstat -o | grep <remote-host>
    tcp6       0      0 ip-172-30-4-233.e:50414 <remote-host>:http ESTABLISHED **keepalive (152.12/0/0)**
    

    So TCP-Keep-Alive packet sent after 350 sec from the last packet with no response closes the ESTABLISHED connection. All TCP-Keep-Alive packets can be seen via tcp dump:

    enter image description here