I am trying to use proxybroker to generate a file with active proxies for certain countries. I always get the same error trying to fetch the proxies. The error seems to be an encoding/decoding error in a packe used by proxbroker. But I suspect there might be better ways to use proxybroker.
This is the code that causes problems:
def gather_proxies(countries):
"""
This method uses the proxybroker package to asynchronously get two new proxies per specified country
and returns the proxies as a list of country and proxy.
:param countries: The ISO style country codes to fetch proxies for. Countries is a list of two letter strings.
:return: A list of proxies that are themself a list with two paramters[Location, proxy address].
"""
proxy_list = []
types = ['HTTP']
for country in countries:
loop = asyncio.get_event_loop()
proxies = asyncio.Queue(loop=loop)
broker = Broker(proxies, loop=loop,)
loop.run_until_complete(broker.find(limit=2, countries=country, types=types))
while True:
proxy = proxies.get_nowait()
if proxy is None:
break
print(str(proxy))
proxy_list.append([country, proxy.host + ":" + str(proxy.port)])
return proxy_list
and the error message:
../app/main/download_thread.py:344: in update_proxies
proxy_list = gather_proxies(country_list)
../app/main/download_thread.py:368: in gather_proxies
loop.run_until_complete(broker.find(limit=2, countries=country, types=types))
/usr/lib/python3.5/asyncio/base_events.py:387: in run_until_complete
return future.result()
/usr/lib/python3.5/asyncio/futures.py:274: in result
raise self._exception
/usr/lib/python3.5/asyncio/tasks.py:241: in _step
result = coro.throw(exc)
../venv/lib/python3.5/site-packages/proxybroker/api.py:108: in find
await self._run(self._checker.check_judges(), action)
../venv/lib/python3.5/site-packages/proxybroker/api.py:114: in _run
await tasks
/usr/lib/python3.5/asyncio/futures.py:361: in __iter__
yield self # This tells Task to wait for completion.
/usr/lib/python3.5/asyncio/tasks.py:296: in _wakeup
future.result()
/usr/lib/python3.5/asyncio/futures.py:274: in result
raise self._exception
/usr/lib/python3.5/asyncio/tasks.py:241: in _step
result = coro.throw(exc)
../venv/lib/python3.5/site-packages/proxybroker/checker.py:26: in check_judges
await asyncio.gather(*[j.check() for j in self._judges])
/usr/lib/python3.5/asyncio/futures.py:361: in __iter__
yield self # This tells Task to wait for completion.
/usr/lib/python3.5/asyncio/tasks.py:296: in _wakeup
future.result()
/usr/lib/python3.5/asyncio/futures.py:274: in result
raise self._exception
/usr/lib/python3.5/asyncio/tasks.py:239: in _step
result = coro.send(None)
../venv/lib/python3.5/site-packages/proxybroker/judge.py:62: in check
page = await resp.text()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <ClientResponse(http://ip.spys.ru/) [200 OK]>
<CIMultiDictProxy('Date': 'Thu, 18 Aug 2016 11:02:53 GMT', 'Server': 'Ap...': 'no-cache', 'Vary': 'Accept-Encoding', 'Transfer-Encoding': 'chunked', 'Content-Type': 'text/html; charset=UTF-8')>
encoding = 'utf-8'
@asyncio.coroutine
def text(self, encoding=None):
"""Read response payload and decode."""
if self._content is None:
yield from self.read()
if encoding is None:
encoding = self._get_encoding()
> return self._content.decode(encoding)
E UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 5568: invalid continuation byte
../venv/lib/python3.5/site-packages/aiohttp/client_reqrep.py:758: UnicodeDecodeError
The problem seems to be within the proxybroker or rather the aiohttp package. But since it is supposedly a tested package the problem is probably my code.
Can anyone see what I did wrong or does anyone have a suggestion regarding the useage of proxybroker?
The problem is in resp.text()
call.
It retrieves html page as text.
aiohttp tries to determine proper encoding using chardet
library but for malformed pages it's not possible.
I think resp.text()
should be replaced with resp.read()
for fetching page as bytes
without decoding to str
.