I'm just trying to understand. When using requests
the request throws 403 (when without headers) or Read Timeout (when with headers). Doing the same thing with http.client
gets 200 status code as response.
The url i'm trying to fetch is: https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg
Code that fails:
import requests
url = 'https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg'
try:
response = requests.get(url, verify=False, timeout=10) # Disable SSL verification
response.raise_for_status()
except requests.exceptions.RequestException as e:
print("Error:", e)
Code that works:
import http.client
import ssl
conn = http.client.HTTPSConnection("img.uefa.com", context=ssl._create_unverified_context())
conn.request("GET", "/imgml/uefacom/uel/social/og-default.jpg")
response = conn.getresponse()
print(response.status, response.reason)
conn.close()
I've tried many things, adding multiple headers, but none worked.
The following command in curl also works
curl -v "https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg" --output image.jpg
Also opening in browser works.
Note: All requests done locally
Does requests
do any step that may impact in this problem?
Some sites will reject traffic from clients with "invalid" user agent strings.
If you print the default headers object that the requests Python library uses, you can see that it's pretty explicitly noted that the request is coming from a Python script:
{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
The site owner likely wants to limit bots and web scraping, so this user agent is not accepted. The http/httpx library's user agent is likely not being filtered out.
The below code with a browser-like user agent works fine, as you already specified above.
import requests
url = 'https://img.uefa.com/imgml/uefacom/uel/social/og-default.jpg'
headers = {
'User-Agent': (
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15' # noqa
)
}
try:
response = requests.get(
url,
headers=headers,
verify=True, # Disable SSL verification (if needed)
timeout=10, # Timeout after 10 seconds
)
print(response.status_code)
except Exception as e:
print("Error:", e)