so i'm trying to scrape questions from Quora, from the link https://www.quora.com/search?q=microwave&type=question Since the questions are dynamically loaded at first I used selenium to simulate scroll down but it is really slow so I'm trying differently. When scrolling down Quora sends a POST request to another link with some payload, I went in Dev tools and network to see what payload they were using.
It looks like this :
{"queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":null,"resultType":"question","author":null,"time":"all_times","first":10,"after":"19","tribeId":null},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
I ran this :
import requests
url = 'https://www.quora.com/graphql/gql_para_POST?q=SearchResultsListQuery'
data = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76", "queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":'null',"resultType":"question","author":'null',"time":"all_times","first":10,"after":"19","tribeId":'null'},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
r = requests.post(url, data = data)
print(r)
And got <Response [400]>
I plugged in my user agent and replaced the null for 'null', i also tried None or '' or even deleting these keys from the dict but nothing gets it to work.
So maybe I got the wrong hash, I looked at the whole website HTML and other requests it sends and receives to find the hash but didn't succeed.
First of all, ensure that your payload is properly formatted as JSON, like this:
data = json.dumps({
"queryName": "SearchResultsListQuery",
"variables": {
"query": "microwave",
"disableSpellCheck": None,
"resultType": "question",
"author": None,
"time": "all_times",
"first": 10,
"after": "19",
"tribeId": None
},
"extensions": {
"hash": "f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"
}
})
Also, to get a successful response from the quora graph API, you must include a cookie in your request headers:
headers = {
'cookie': '...',
...
}
r = requests.post(url, headers=headers, data=data)
You can find the cookie in your browsers dev tools.