Search code examples
python-2.7urllib2urlopeneventlet

how to fix python, urlopen error [Errno 8], using eventlet green


Python novice here.

I'm making a lot of asynchronous http requests using eventlet and urllib2. At the top of my file I have

import eventlet
import urllib
from eventlet.green import urllib2

Then I make a lot of asynchronous http requests that succeed with this line:

conn = urllib2.urlopen(signed_url, None)

And all of a sudden, I get this error:

URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

This error occurs on the same urllib2.urlopen line, which is weird because it succeeded many times before. Also, when I print the signed_url and then just paste it to my browser, I get a proper response, so the url is properly formatted.

I've bounced around posts, but cannot find the right debugging strategy for this. Conceptually, what can be causing this error? And how do you recommend I go about fixing it?

I'm using Python 2.7.6.

Thank you.


Solution

  • The 'nodename not known' error means DNS resolution failed. Most likely cause is upstream DNS server rate limit. If you do web crawling seriously, I can recommend two approaches:

    • easy: upon getting this error, just throttle down your concurrency limit, make fewer requests per minute. Treat first N occurrences of this error as temporary, repeat fetching of URL after a little delay. Setup local caching recursive DNS server (e.g. dnsmasq, unbound).
    • hard: split DNS resolving and HTTP fetching. Have a separate queue of DNS names to resolve. Pass resolved IP address in URL http://1.2.3.4/path and Host: domain header to urlopen. This will allow to limit concurrency of DNS requests and actual HTTP requests separately. This will not help if you mostly fetch only one request per unique host. Find yourself many recursive DNS servers to distribute work, collect their response time stats, use faster ones more frequently.