My original task: Using Trello API, get data through HTTP GET requests. Run requests and process responses asynchronously, if possible. The API provider uses "https://" URL I access to with some key and token.
Tools I used:
requests
library (just
imported without installation)grequests
library (installed via pip
from this git repo)Original task result: Only requests
library worked, I've got Trello API's response, great. grequests
library was failing with status_code = 302.
I tried to understand why it happens and wrote two reproducible scripts.
Script A : requests
library used:
import requests
urls = [
"https://www.google.com",
"https://www.facebook.com/",
"http://www.facebook.com/",
"http://www.google.com",
"http://fakedomain/",
"http://python-tablib.org"
]
# Run requests:
for url in urls:
print requests.get(url).status_code
Console output A (having some exception because of http://fakedomain/
) :
200
200
200
200
Traceback (most recent call last):
File "req.py", line 14, in <module>
print requests.get(url).status_code
File "D:\python\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "D:\python\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "D:\python\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "D:\python\lib\site-packages\requests\sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "D:\python\lib\site-packages\requests\adapters.py", line 415, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', gaierror(11001, 'getaddrinfo failed'))
Script B : grequests
library used with map
to send asynchronous requests:
import grequests
# This function will execute set of instructions when responses come:
def proc_response(response, **kwargs):
# do something ..
print response
# Request exception handler:
def my_except_handler(request, excetion):
print "Request failed : " + request.url
urls = [
"https://www.google.com",
"https://www.facebook.com/",
"http://www.facebook.com/",
"http://www.google.com",
"http://fakedomain/",
"http://python-tablib.org"
]
# Here is the list of tasks we build and run in parallel later:
actions_list = []
# Tasks list building:
for url in urls:
action_item = grequests.get(url, hooks = {'response' : proc_response})
actions_list.append(action_item)
# Run grequests:
print grequests.map(actions_list, exception_handler=my_except_handler)
Console output B :
<Response [302]>
<Response [302]>
<Response [200]>
<Response [301]>
<Response [302]>
<Response [200]>
Request failed : https://www.google.com
Request failed : https://www.facebook.com/
Request failed : http://www.facebook.com/
Request failed : http://fakedomain/
[None, None, None, <Response [200]>, None, <Response [200]>]
All I can conclude based on this information and my relatively small experience is the following - because of some reason grequests
is rejected by remote websites requests
works normally with. As long as 302 means redirection of some kind, it seems that grequests
can not get data from source it is redirected to when requests
can. allow_redirects=True
in get
method in Script B didn't solve the issue.
I wonder why libraries give different response. It is possible that I miss something, and these two scripts have to return different results by design, not because of differences between two libraries.
Thanks for your help in advance.
Here is my script b.py
, which I run via $ py.test -sv b.py
:
import pytest
import grequests
@pytest.fixture
def urls():
return [
"https://www.google.com",
"https://www.facebook.com/",
"http://www.facebook.com/",
"http://www.google.com",
"http://fakedomain/",
"http://python-tablib.org"
]
# This function will execute set of instructions when responses come:
def proc_response(response, **kwargs):
# do something ..
print "========Processing response=============", response.request.url
print response
if response.status_code != 200:
print response.request.url
print response.content
# Request exception handler:
def my_except_handler(request, exception):
print "Request failed : " + request.url
print request.response
def test_it(urls):
# Here is the list of tasks we build and run in parallel later:
actions_list = []
# Tasks list building:
for url in urls:
action_item = grequests.get(url, hooks={'response': proc_response})
actions_list.append(action_item)
# Run grequests:
print grequests.map(actions_list, exception_handler=my_except_handler)
It is based on your code, it is only rewritten to easy my experimentation.
Last printout of my test shows:
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, None, <Response [200]>]
This is what is expected.
Note, that you could have some temporary problems with fetching the data, there are too many players participating.
The difference is, that with requests
you are asking for final result while with grequests
you
deploy process_response
hook, which is called for each response including redirect ones.
The requests
processing goes through redirect too, but this temporary response is not reported.