Search code examples
pythonsessionweb-scrapingpython-requeststimeout

Timeout within session while sending requests


I'm trying to learn how I can use timeout within a session while sending requests. The way I've tried below can fetch the content of a webpage but I'm not sure this is the right way as I could not find the usage of timeout in this documentation.

import requests

link = "https://stackoverflow.com/questions/tagged/web-scraping"

with requests.Session() as s:
    r = s.get(link,timeout=5)
    print(r.text)

How can I use timeout within session?


Solution

  • I'm not sure this is the right way as I could not find the usage of timeout in this documentation.

    Scroll to the bottom. It's definitely there. You can search for it in the page by pressing Ctrl+F and entering timeout.

    You're using timeout correctly in your code example.

    You can actually specify the timeout in a few different ways, as explained in the documentation:

    If you specify a single value for the timeout, like this:

    r = requests.get('https://github.com', timeout=5)

    The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:

    r = requests.get('https://github.com', timeout=(3.05, 27))

    If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.

    r = requests.get('https://github.com', timeout=None)

    Try using https://httpstat.us/200?sleep=5000 to test your code.

    For example, this raises an exception because 0.2 seconds is not long enough to establish a connection with the server:

    import requests
    
    link = "https://httpstat.us/200?sleep=5000"
    
    with requests.Session() as s:
        try:
            r = s.get(link, timeout=(0.2, 10))
            print(r.text)
        except requests.exceptions.Timeout as e:
            print(e)
    

    Output:

    HTTPSConnectionPool(host='httpstat.us', port=443): Read timed out. (read timeout=0.2)
    

    This raises an exception because the server waits for 5 seconds before sending the response, which is longer than the 2 second read timeout set:

    import requests
    
    link = "https://httpstat.us/200?sleep=5000"
    
    with requests.Session() as s:
        try:
            r = s.get(link, timeout=(3.05, 2))
            print(r.text)
        except requests.exceptions.Timeout as e:
            print(e)
    

    Output:

    HTTPSConnectionPool(host='httpstat.us', port=443): Read timed out. (read timeout=2)
    

    You specifically mention using a timeout within a session. So maybe you want a session object which has a default timeout. Something like this:

    import requests
    
    link = "https://httpstat.us/200?sleep=5000"
    
    class EnhancedSession(requests.Session):
        def __init__(self, timeout=(3.05, 4)):
            self.timeout = timeout
            return super().__init__()
    
        def request(self, method, url, **kwargs):
            print("EnhancedSession request")
            if "timeout" not in kwargs:
                kwargs["timeout"] = self.timeout
            return super().request(method, url, **kwargs)
    
    session = EnhancedSession()
    
    try:
        response = session.get(link)
        print(response)
    except requests.exceptions.Timeout as e:
        print(e)
    
    try:
        response = session.get(link, timeout=1)
        print(response)
    except requests.exceptions.Timeout as e:
        print(e)
    
    try:
        response = session.get(link, timeout=10)
        print(response)
    except requests.exceptions.Timeout as e:
        print(e)
    
    

    Output:

    EnhancedSession request
    HTTPSConnectionPool(host='httpstat.us', port=443): Read timed out. (read timeout=4)
    EnhancedSession request
    HTTPSConnectionPool(host='httpstat.us', port=443): Read timed out. (read timeout=1)
    EnhancedSession request
    <Response [200]>