My code is as follows:
https://github.com/T145/tphroxy/blob/master/mirror.py
https://github.com/T145/tphroxy/blob/master/transform_content.py
And when going to certain websites I get errors along these lines:
Traceback (most recent call last):
File " ... /mirror.py", line 108, in fetch_and_store
response = urlfetch.fetch(mirrored_url)
File " ... /google/appengine/api/urlfetch.py", line 293, in fetch
return rpc.get_result()
File " ... /google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File " ... /python27_lib/versions/1/google/appengine/api/urlfetch.py", line 449, in _get_fetch_result
raise DNSLookupFailedError('DNS lookup failed for URL: ' + url)
DNSLookupFailedError: DNS lookup failed for URL: http://public/images/v6/btn_arrow_down_padded_white.png
My guess is that specific asset url patterns aren't being matched and sent through the proxy properly, i.e. transform_content
is missing a pattern. Any help to solving this problem is greatly appreciated! I'm open to using any alternative libraries if needed.
EDIT
I've added a test suite for transform_content
, and I'm certain the primary problems are with my regex expressions from its results. Run it w/ py transform_content_test.py
if you're on Windows to get the results.
DNS lookup failed for URL: http://public/...
Note the missing domain (host) portion in the URL, the public
string will be parsed as the domain, which is invalid, causing the error you see.
The URL should be something like http://<valid_domain>/public/...
, so check your code building that URL.
You're doing quite a few string ops on the URLs, check that all your possible code paths operate properly, my guess is that some are not doing what you're expecting them to.