Search code examples
pythonweb-scrapingpostpython-requests

Cannot POST request without cookie header in Python


Objective

I am trying to get historical data on stock indices from this website. I want to automate the process.

Steps to reproduce the problem

In the webpage, to get the required data you have to do the following steps.

  1. Select Index Type
  2. Choose an index under Select an Index
  3. Give time period under Select Time Period
  4. This will give a table on the webpage with the index values within the given dates

I recorded the Network tab in Chrome Developer during the above steps. The request URL is "https://www.niftyindices.com/Backpage.aspx/getHistoricaldatatabletoString".

After many iterations of the solution I finally got the result by setting all the request headers and passing it to the headers parameter in requests.post() method. My solution is as follows.

import requests

url = "https://www.niftyindices.com/Backpage.aspx/getHistoricaldatatabletoString"
json_payload = {'name': 'NIFTY AUTO', 'startDate': '01-Feb-2023', 'endDate': '01-Feb-2024'}

headers = {
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate, br, zstd',
    'Accept-Language': 'en-GB,en;q=0.9',
    'Connection': 'keep-alive',
    'Content-Type': 'application/json; charset=UTF-8',
    'Cookie': 'AbCd1234',
    'Host': 'www.niftyindices.com',
    'Origin': 'https://www.niftyindices.com',
    'Referer': 'https://www.niftyindices.com/reports/historical-data',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"'
}

response = requests.post(url, json=json_payload, headers=headers, timeout=30)

print(response.status_code)
print(response.json())

Sysmtem Info: Windows 10, Python 3.19.3, requests 2.28.1

Problem Description

If I remove any part of the header, like the cookie header (which should be optional) the code above throws a ReadTimeout error. If the timeout parameter is not given obviously it does not terminate. Also the cookie header will change in the next session as per my knowledge which does not serve me the purpose of automating this data extraction.

NOTE: The cookie header value I have given here is a dummy and not used in the actual implementation.

How do I get around this problem of setting the cookie header manually i.e. automate the script fully?

Is there a solution where I don't have to set so many headers manually?


Solution

  • This is the tested minimal headers required for the request to work.

    You don't have to set these headers manually, as they are always fixed in this request.

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "X-Requested-With": "XMLHttpRequest"
    }
    

    Without any of the 3 headers above, the request will hang to timeout.

    No cookies involved.

    The server is very laggy! You need to retry many times if timeout.

    Screenshot: Code execution result