Search code examples
pythonregexgoogle-app-enginegoogle-cloud-dns

Resolving DNSLookupFailedError for Python web proxy


My code is as follows:

https://github.com/T145/tphroxy/blob/master/mirror.py

https://github.com/T145/tphroxy/blob/master/transform_content.py

And when going to certain websites I get errors along these lines:

Traceback (most recent call last):
  File " ... /mirror.py", line 108, in fetch_and_store
    response = urlfetch.fetch(mirrored_url)
  File " ... /google/appengine/api/urlfetch.py", line 293, in fetch
    return rpc.get_result()
  File " ... /google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
    return self.__get_result_hook(self)
  File " ... /python27_lib/versions/1/google/appengine/api/urlfetch.py", line 449, in _get_fetch_result
    raise DNSLookupFailedError('DNS lookup failed for URL: ' + url)
DNSLookupFailedError: DNS lookup failed for URL: http://public/images/v6/btn_arrow_down_padded_white.png

My guess is that specific asset url patterns aren't being matched and sent through the proxy properly, i.e. transform_content is missing a pattern. Any help to solving this problem is greatly appreciated! I'm open to using any alternative libraries if needed.

DEMO

EDIT

I've added a test suite for transform_content, and I'm certain the primary problems are with my regex expressions from its results. Run it w/ py transform_content_test.py if you're on Windows to get the results.


Solution

  • DNS lookup failed for URL: http://public/... Note the missing domain (host) portion in the URL, the public string will be parsed as the domain, which is invalid, causing the error you see.

    The URL should be something like http://<valid_domain>/public/..., so check your code building that URL.

    You're doing quite a few string ops on the URLs, check that all your possible code paths operate properly, my guess is that some are not doing what you're expecting them to.