Search code examples
pythonurllib2urlopen

Detecting timeout erros in Python's urllib2 urlopen


I'm still relatively new to Python, so if this is an obvious question, I apologize.

My question is in regard to the urllib2 library, and it's urlopen function. Currently I'm using this to load a large amount of pages from another server (they are all on the same remote host) but the script is killed every now and then by a timeout error (I assume this is from the large requests).

Is there a way to keep the script running after a timeout? I'd like to be able to fetch all of the pages, so I want a script that will keep trying until it gets a page, and then moves on.

On a side note, would keeping the connection open to the server help?


Solution

  • Next time the error occurs, take note of the error message. The last line will tell you the type of exception. For example, it might be a urllib2.HTTPError. Once you know the type of exception raised, you can catch it in a try...except block. For example:

    import urllib2
    import time
    
    for url in urls:
        while True:
            try:
                sock=urllib2.urlopen(url)
            except (urllib2.HTTPError, urllib2.URLError) as err:
                # You may want to count how many times you reach here and
                # do something smarter if you fail too many times.
                # If a site is down, pestering it every 10 seconds may not
                # be very fruitful or polite.
                time.sleep(10)
            else:              
                # Success  
                contents=sock.read()
                # process contents
                break                # break out of the while loop