Search code examples
pythonweb-scrapingpython-requestsdata-miningdata-extraction

How to send a get request with data in Python?


I want to get data from this site:
https://www.techstars.com/portfolio?category=all%20companies

As you see in the network tab, a get request is sent to this link:
https://datacore.techstars.com/companies?order=name&program_status=in.(session_in_progress,session_over)}&type=eq.Graduate&session=not.in.(%22%22)&offset=0&limit=50

but when I open it it says "permission denied..." and same when I send a get requests in Python.

How can I send a get requests to this link with correct data?

Here is me code.

import requests
url = 'https://datacore.techstars.com/companies?order=name&program_status=in.(session_in_progress,session_over)}&type=eq.Graduate&session=not.in.(%22%22)&offset=0&limit=50'
payload = {'order':'name','program_status':'in.(session_in_progress,session_over)}', 'type':'eq.Graduate','session':'not.in.(%22%22)','offset':'0','limit':'50'}
r = requests.get(url, data=payload)
r.content

and it give me this result

b'{"hint":null,"details":null,"code":"42501","message":"permission denied for table companies"}'

Solution

  • You also need to supply additional headers for your request to work. For example:

    import requests
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
        'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoicGdyZXN0X3d3dzIifQ.RB9HicmPNEl4C0Ree9SVw3Oh5tinjDiIbBurBujVnEg',
        'Accept' : 'application/json, text/plain, */*',
        'Origin' : 'https://www.techstars.com'
    }
    
    url = "https://datacore.techstars.com/companies?order=name&program_status=in.(session_in_progress,session_over)}&type=eq.Graduate&session=not.in.(%22%22)&id=in.(001E000001EZFcYIAX,001E000000I0FdNIAV,001E000000SsjXdIAJ,001E000000HzxB9IAJ,001E000000IyUe7IAF,001E000000HzKYCIA3,001E000000IIItfIAH,001E000000IyUJRIA3)}&offset=0&limit=50"
    r = requests.get(url, headers=headers)
    
    for entry in r.json():
        print(f"{entry['name']} - {entry['description']}")
    

    An Authorizaton header is needed. The value for this is probably inside the HTML for the main page.

    This would give you output as follows:

    Chainalysis - Building the compliance layer for the future of value transfer.
    ClassPass - ClassPass is a membership program for fitness classes across multiple gyms and studios, making working out more accessible.
    DataRobot - DataRobot brings AI technology and ROI enablement services to global enterprises.
    DigitalOcean - The cloud for developers
    Outreach - Outreach is a sales engagement platform that accelerates revenue growth by optimizing interactions throughout the customer lifecycle.
    Remitly - Remitly is a mobile payments service that enables users to make person-to-person international money transfers.
    SendGrid - SendGrid is a cloud-based customer communication platform that drives engagement and business growth.
    Zipline - Zipline is creating a highly automated drone network to shuttle blood and pharmaceuticals to remote locations in hours rather than weeks or months.