Search code examples
python-2.7scrapypython-requestssplash-screenscrapyjs

How to use Splash with python-requests?


I want to use splash in requests, something like this

requests.post(myUrl,headers=myHeaders, data=payload, meta={
                                        'splash': {
                                            'endpoint': 'render.html',
                                            'args': {'wait': 1}
                                            }
                                        })

but I have this error

TypeError: request() got an unexpected keyword argument 'meta'

I know that this work with scrapy.Request but I want to use with requests


Solution

  • meta is Scrapy Request-specific and python-requests' request does not have a meta argument, hence the TypeError exception.

    To use Splash with python-requests, read the HTTP API docs, especially on render.html as this is what you want to use it seems.

    You'll want a GET request to the /render.html endpoint, and pass target URL, and wait argument as query parameter, e.g. like this:

    import requests
    requests.get('http://localhost:8050/render.html',
                 params={'url': 'http://www.example.com', 'wait': 2})
    

    If you want Splash to issue a POST request to the target website, use http_method and body arguments:

    import requests
    requests.get('http://localhost:8050/render.html',
                  params={'url': 'http://httpbin.org/post',
                          'http_method': 'POST',
                          'body': 'a=b',
                          'wait': 2})
    

    /render.html also allows POST-ed requests to the endpoint:

    Splash is controlled via HTTP API. For all endpoints below parameters may be sent either as GET arguments or encoded to JSON and POSTed with Content-Type: application/json header.

    but the default method is still GET. To do a POST to the target website, you still need to include a http_method argument:

    import requests
    
    requests.post('http://localhost:8050/render.html',
                  json={'url': 'http://httpbin.org/post',
                        'http_method': 'POST',
                        'body': 'a=b',
                        'wait': 2})