Search code examples
web-scrapingpython-requestshttp-status-code-403

Python requests 403 Forbidden referer from network headers


This request used to work but now gets a 403. I tried adding a user agent like in this answer but still no good: https://stackoverflow.com/a/38489588/2415706

This second answer further down says to find the referer header but I can't figure out where these response headers are: https://stackoverflow.com/a/56946001/2415706

import requests
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
"referer": "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State"
job_url = "https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State"
job_response = requests.get(job_url,  headers=headers, timeout=10)
print(job_response)

This is what I see under Request Headers for the first tab after refreshing the page but there's too much stuff. I assume I only need one of these lines.

:authority: www.ziprecruiter.com
:method: GET
:path: /Salaries/What-Is-the-Average-Programmer-Salary-by-State
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: max-age=0
cookie: __cfduid=dea4372c39465cfa2422e97f84dea45fb1620355067; zva=100000000%3Bvid%3AYJSn-w3tCu9yJwJx; ziprecruiter_browser=99.31.211.77_1620355067_495865399; SAFESAVE_TOKEN=1a7e5e90-60de-494d-9af5-6efdab7ade45; zglobalid=b96f3b99-1bed-4b7c-a36f-37f2d16c99f4.62fd155f2bee.6094a7fb; ziprecruiter_session=66052203cea2bf6afa7e45cae7d1b0fe; experian_campaign_visited=1
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"
sec-ch-ua-mobile: ?0
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36

EDIT: looking at the other tabs, they have referer: "referer": "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State" so I'm trying that now but it is still 403.


Solution

  • Using httpx package it seems to work with:

    import httpx
    
    url = 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State'
    
    r = httpx.get(url)
    
    print(r.text)
    print(r.status_code)
    print(r.http_version)
    

    repl.it: https://replit.com/@bertrandmartel/ZipRecruiter

    I may be wrong but I think that the server didn't like the TLS negociation for the requests library. It's weird since the above call is using HTTP1.1 in the request and with curl it only works with http2 and TLS1.3

    Using a curl binary built with http2 and with openssl supporting TLS1.3, the following works:

    docker run --rm curlimages/curl:7.76.1 \
        --http2 --tlsv1.3 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State' \
        -H 'user-agent: Mozilla' \
        -s -o /dev/null -w "%{http_code}"
    

    returns:

    301
    

    The following commands are failing:

    • forcing http1.1 and enforcing TLS 1.3
    docker run --rm curlimages/curl:7.76.1 \
        --http1.1 --tlsv1.3 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State' \
        -H 'user-agent: Mozilla' \
        -s -o /dev/null -w "%{http_code}"
    

    Output: 403

    • forcing http2 and enforcing TLS 1.2:
    docker run --rm curlimages/curl:7.76.1 \
        --http2 --tlsv1.2 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State' \
        -H 'user-agent: Mozilla' \
        -s -o /dev/null -w "%{http_code}"
    

    Output: 403

    My guess is that it detects something in the TLS negociation but the check is different when there is both TLS1.3 and HTTP/2

    Unfortunately, you can't check http/2 with requests/urlib since it's not supported