Search code examples
pythongrequests

Python: grequest and request give different response


My original task: Using Trello API, get data through HTTP GET requests. Run requests and process responses asynchronously, if possible. The API provider uses "https://" URL I access to with some key and token.

Tools I used:

  • Python 2.7.10 | Anaconda 2.3.0 (64-bit) | (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32
  • requests library (just imported without installation)
  • grequests library (installed via pip from this git repo)

Original task result: Only requests library worked, I've got Trello API's response, great. grequests library was failing with status_code = 302.

I tried to understand why it happens and wrote two reproducible scripts.

Script A : requests library used:

import requests

urls = [
    "https://www.google.com",
    "https://www.facebook.com/",
    "http://www.facebook.com/",
    "http://www.google.com",
    "http://fakedomain/",
    "http://python-tablib.org"
]

# Run requests:
for url in urls:
    print requests.get(url).status_code

Console output A (having some exception because of http://fakedomain/) :

200
200
200
200
Traceback (most recent call last):
  File "req.py", line 14, in <module>
    print requests.get(url).status_code
  File "D:\python\lib\site-packages\requests\api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "D:\python\lib\site-packages\requests\api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "D:\python\lib\site-packages\requests\sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\python\lib\site-packages\requests\sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "D:\python\lib\site-packages\requests\adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', gaierror(11001, 'getaddrinfo failed'))

Script B : grequests library used with map to send asynchronous requests:

import grequests

# This function will execute set of instructions when responses come:
def proc_response(response, **kwargs):
    # do something ..
    print response

# Request exception handler:
def my_except_handler(request, excetion):
    print "Request failed : " + request.url

urls = [
    "https://www.google.com",
    "https://www.facebook.com/",
    "http://www.facebook.com/",
    "http://www.google.com",
    "http://fakedomain/",
    "http://python-tablib.org"
]
# Here is the list of tasks we build and run in parallel later:
actions_list = []

# Tasks list building:
for url in urls:
    action_item = grequests.get(url, hooks = {'response' : proc_response})
    actions_list.append(action_item)

# Run grequests:
print grequests.map(actions_list, exception_handler=my_except_handler)

Console output B :

<Response [302]>
<Response [302]>
<Response [200]>
<Response [301]>
<Response [302]>
<Response [200]>
Request failed : https://www.google.com
Request failed : https://www.facebook.com/
Request failed : http://www.facebook.com/
Request failed : http://fakedomain/
[None, None, None, <Response [200]>, None, <Response [200]>]

All I can conclude based on this information and my relatively small experience is the following - because of some reason grequests is rejected by remote websites requests works normally with. As long as 302 means redirection of some kind, it seems that grequests can not get data from source it is redirected to when requests can. allow_redirects=True in get method in Script B didn't solve the issue.

I wonder why libraries give different response. It is possible that I miss something, and these two scripts have to return different results by design, not because of differences between two libraries.

Thanks for your help in advance.


Solution

  • grequests works well for me

    Here is my script b.py, which I run via $ py.test -sv b.py:

    import pytest
    import grequests
    
    
    @pytest.fixture
    def urls():
        return [
            "https://www.google.com",
            "https://www.facebook.com/",
            "http://www.facebook.com/",
            "http://www.google.com",
            "http://fakedomain/",
            "http://python-tablib.org"
        ]
    
    
    # This function will execute set of instructions when responses come:
    def proc_response(response, **kwargs):
        # do something ..
        print "========Processing response=============", response.request.url
        print response
        if response.status_code != 200:
            print response.request.url
            print response.content
    
    
    # Request exception handler:
    def my_except_handler(request, exception):
        print "Request failed : " + request.url
        print request.response
    
    
    def test_it(urls):
        # Here is the list of tasks we build and run in parallel later:
        actions_list = []
    
        # Tasks list building:
        for url in urls:
            action_item = grequests.get(url, hooks={'response': proc_response})
            actions_list.append(action_item)
    
        # Run grequests:
        print grequests.map(actions_list, exception_handler=my_except_handler)
    

    It is based on your code, it is only rewritten to easy my experimentation.

    Results: Final results are 200 or None

    Last printout of my test shows:

    [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, None, <Response [200]>]
    

    This is what is expected.

    Note, that you could have some temporary problems with fetching the data, there are too many players participating.

    Conclusion: different response processing confused you

    The difference is, that with requests you are asking for final result while with grequests you deploy process_response hook, which is called for each response including redirect ones.

    The requests processing goes through redirect too, but this temporary response is not reported.