Search code examples
pythonweb-scrapingpostpayload

Sending POST requests - Response 400


so i'm trying to scrape questions from Quora, from the link https://www.quora.com/search?q=microwave&type=question Since the questions are dynamically loaded at first I used selenium to simulate scroll down but it is really slow so I'm trying differently. When scrolling down Quora sends a POST request to another link with some payload, I went in Dev tools and network to see what payload they were using.

It looks like this :

{"queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":null,"resultType":"question","author":null,"time":"all_times","first":10,"after":"19","tribeId":null},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}

I ran this :

import requests

url = 'https://www.quora.com/graphql/gql_para_POST?q=SearchResultsListQuery'
data = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76", "queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":'null',"resultType":"question","author":'null',"time":"all_times","first":10,"after":"19","tribeId":'null'},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
r = requests.post(url, data = data)
print(r) 

And got <Response [400]> I plugged in my user agent and replaced the null for 'null', i also tried None or '' or even deleting these keys from the dict but nothing gets it to work. So maybe I got the wrong hash, I looked at the whole website HTML and other requests it sends and receives to find the hash but didn't succeed.

  1. Is the error 400 coming from 'null' items ?
  2. Is the hash a common thing used in POST requests and how to possibly get it ? Thanks

Solution

  • First of all, ensure that your payload is properly formatted as JSON, like this:

    data = json.dumps({
      "queryName": "SearchResultsListQuery",
      "variables": {
        "query": "microwave",
        "disableSpellCheck": None,
        "resultType": "question",
        "author": None,
        "time": "all_times",
        "first": 10,
        "after": "19",
        "tribeId": None
      },
      "extensions": {
        "hash": "f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"
      }
    })
    

    Also, to get a successful response from the quora graph API, you must include a cookie in your request headers:

    headers = {
        'cookie': '...',
        ...
    }
    
    r = requests.post(url, headers=headers, data=data)
    

    You can find the cookie in your browsers dev tools.