Search code examples
python-3.xpython-requests

Python Requests - POST timeout specification with large files


Background

I have a python app, built using requests, that is used to upload files from client sites to a web server, using POST.

These files are usually small (1-300 KB), but sometime larger (15-20MB). Usually the uploads take a few seconds, however for large files over slow networks may take minutes to complete.

Problem

I'm having a problem figuring out how to use requests timeout in a rational way to handle sending large uploads using POST over slow networks (where the POST may take 1-2 min to complete).

What I'd like

I'd like to be able to declare a session and than a POST using the session, so that

a) an initial timeout was small (so network/gateway/... connection problems etc get detected quickly), BUT

b) a subsequent timeout that is long, so that after the connection is established, but the data takes a few minutes to upload, it won't timeout.

I can't seem to figure out how to do that

I'm also a bit confused by how/what/where the timeout parameters is used when specified as a tuple in conjuction with POST (looks like I'm not alone: https://stackoverflow.com/a/63994047/9423009)

Specifically to illustrate this (meta code - my production code is below), if I have a file to POST that may take 1-2 minute to upload:

file_to_upload = '/path_to_a_big_file'

my_session.post(
    timeout=2,
    files=file_to_upload
)
# above will timeout if POST takes > 2 seconds

my_session.post(
    timeout=60,
    files=file_to_upload
)
# above will succeed if POST takes 40 seconds, BUT will also take 60 seconds
# to throw any exceptions of problems with any routine type network/gateway 40X 
# type problems

my_session.post(
    timeout=(2, 60),
    files=file_to_upload
)
# THIS WILL ALSO TIMEOUT AFTER 2 SECONDS!?

So based on above, how do you specify a small initial 'make connection' timeout, and then a longer, separate, timeout for a POST to complete sending?

Actual code and Additional Stuff

As the sending sites may have variable speed networks, and to handle flaky network problems etc, I use urllib3's Retry to generate Sessionss (courtesy of some great code at https://www.peterbe.com/plog/best-practice-with-retries-with-requests).

With this code, I have a small'ish initial timeout, that the Retry code will increase for a certain amount of times until things fail. But I don't believe this affects the problem here:

def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
) -> requests.Session:
    """ Return requests session using Retry to automatically retry on failures."""

    # add POST to list of methods to retry on
    methods = frozenset({'DELETE', 'GET', 'HEAD', 'OPTIONS', 'PUT', 'POST', 'TRACE'})

    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        status=retries,
        backoff_factor=backoff_factor,
        method_whitelist=methods,
        status_forcelist=status_forcelist,
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)

    return session

# ...

        # send file 

        with open(file_to_send, 'rb') as fh:

            file_arg = [(server_key, fh)]
            with requests_retry_session() as s:

                # try to specify small initial, and long subsequent POST timeout
                # but doesn't work - if POST takes > 2 seconds it still will
                # timeout
                timeout=(2.0, 60.0)
                
                response = s.post(
                    url,
                    headers=headers,
                    data={},
                    files=file_arg,
                    timeout=timeout,
                    proxies=proxies
                )
                response.raise_for_status()

Solution

  • To clarify - the connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine. This should be slightly larger than a multiple of 3, which is the default TCP packet retransmission window.

    Once your client has connected to the server and sent the HTTP request, the read timeout is the number of seconds the client will wait for the server to send a response.

    So, for it to work as expected - try setting the connect timeout to a value larger than 3. e.g.

    timeout=(3.05, 60.0)

    see https://requests.readthedocs.io/en/latest/user/advanced/#timeouts