Search code examples
pythonamazon-web-servicespython-requestsamazon-ecsaws-fargate

Mitigating TCP connection resets in AWS Fargate


I am using Amazon ECS on AWS Fargate, My instances can access the internet, but the connection drops after 350 seconds. On average, out of 100 times, my service is getting ConnectionResetError: [Errno 104] Connection reset by peer error approximately 5 times. I found a couple of suggestions to fix that issue on my server-side code, see here and here

Cause

If a connection that's using a NAT gateway is idle for 350 seconds or more, the connection times out.

When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).

Solution

To prevent the connection from being dropped, you can initiate more traffic over the connection. Alternatively, you can enable TCP keepalive on the instance with a value less than 350 seconds.

Existing Code:

url = "url to call http"
params = {
   "year": year,
   "month": month
}
response = self.session.get(url, params=params)

To fix that I am currently using a band-aid retry logic solution using tenacity,

@retry(
        retry=(
            retry_if_not_exception_type(
                HTTPError
            )  # specific: requests.exceptions.ConnectionError
        ),
        reraise=True,
        wait=wait_fixed(2),
        stop=stop_after_attempt(5),
)
def call_to_api():
    url = "url to call HTTP"
    params = {
       "year": year,
       "month": month
    }
    response = self.session.get(url, params=params)

So my basic question is how can I use python requests correctly to do any of the below solutions,

  • Close the connection before 350 seconds of inactivity

  • Enable Keep-Alive for TCP connections


Solution

  • Posting solution for the future user who will face this issue while working on AWS Farget + NAT,

    We need to set the TCP keepalive settings to the values dictated by our server-side configuration, this PR helps me a lot to fix my issue: https://github.com/customerio/customerio-python/pull/70/files

    import socket
    from urllib3.connection import HTTPConnection
    
    
    HTTPConnection.default_socket_options = ( HTTPConnection.default_socket_options + [
            (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
            (socket.SOL_TCP, socket.TCP_KEEPIDLE, 300),
            (socket.SOL_TCP, socket.TCP_KEEPINTVL, 60)
            ]
    )