Search code examples
python

http.client works but requests throws read timeout


I'm just trying to understand. When using requests the request throws 403 (when without headers) or Read Timeout (when with headers). Doing the same thing with http.client gets 200 status code as response.

The url i'm trying to fetch is: https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg

Code that fails:

import requests

url = 'https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg'

try:
    response = requests.get(url, verify=False, timeout=10)  # Disable SSL verification
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    print("Error:", e)

Code that works:

import http.client
import ssl

conn = http.client.HTTPSConnection("img.uefa.com", context=ssl._create_unverified_context())
conn.request("GET", "/imgml/uefacom/uel/social/og-default.jpg")
response = conn.getresponse()
print(response.status, response.reason)
conn.close()

I've tried many things, adding multiple headers, but none worked.

The following command in curl also works

curl -v "https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg" --output image.jpg

Also opening in browser works.

Note: All requests done locally

Does requests do any step that may impact in this problem?


Solution

  • Some sites will reject traffic from clients with "invalid" user agent strings.

    If you print the default headers object that the requests Python library uses, you can see that it's pretty explicitly noted that the request is coming from a Python script:

    {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
    

    The site owner likely wants to limit bots and web scraping, so this user agent is not accepted. The http/httpx library's user agent is likely not being filtered out.

    The below code with a browser-like user agent works fine, as you already specified above.

    import requests
    
    url = 'https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg'
    
    headers = {
        'User-Agent': (
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15' # noqa
        )
    }
    
    try:
        response = requests.get(
            url,
            headers=headers,
            verify=True,     # Disable SSL verification (if needed)
            timeout=10,       # Timeout after 10 seconds
        )
        print(response.status_code)
    except Exception as e:
        print("Error:", e)