I am trying to get historical data on stock indices from this website. I want to automate the process.
In the webpage, to get the required data you have to do the following steps.
Index Type
Select an Index
Select Time Period
I recorded the Network tab in Chrome Developer during the above steps. The request URL is "https://www.niftyindices.com/Backpage.aspx/getHistoricaldatatabletoString".
After many iterations of the solution I finally got the result by setting all the request headers and passing it to the headers parameter in requests.post()
method. My solution is as follows.
import requests
url = "https://www.niftyindices.com/Backpage.aspx/getHistoricaldatatabletoString"
json_payload = {'name': 'NIFTY AUTO', 'startDate': '01-Feb-2023', 'endDate': '01-Feb-2024'}
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br, zstd',
'Accept-Language': 'en-GB,en;q=0.9',
'Connection': 'keep-alive',
'Content-Type': 'application/json; charset=UTF-8',
'Cookie': 'AbCd1234',
'Host': 'www.niftyindices.com',
'Origin': 'https://www.niftyindices.com',
'Referer': 'https://www.niftyindices.com/reports/historical-data',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"'
}
response = requests.post(url, json=json_payload, headers=headers, timeout=30)
print(response.status_code)
print(response.json())
Sysmtem Info: Windows 10, Python 3.19.3, requests 2.28.1
If I remove any part of the header, like the cookie header (which should be optional) the code above throws a ReadTimeout
error. If the timeout
parameter is not given obviously it does not terminate. Also the cookie header will change in the next session as per my knowledge which does not serve me the purpose of automating this data extraction.
NOTE: The cookie header value I have given here is a dummy and not used in the actual implementation.
How do I get around this problem of setting the cookie header manually i.e. automate the script fully?
Is there a solution where I don't have to set so many headers manually?
This is the tested minimal headers required for the request to work.
You don't have to set these headers manually, as they are always fixed in this request.
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Accept": "application/json, text/javascript, */*; q=0.01",
"X-Requested-With": "XMLHttpRequest"
}
Without any of the 3 headers above, the request will hang to timeout.
No cookies involved.
The server is very laggy! You need to retry many times if timeout.