python python-3.x selenium-webdriver web-scraping python-requests

Scrapers blocked but not browser

I am trying to scrape from https://www.rule34video.com/ using python

At first, it worked with a simple request.get(), however, the subsequent attempts failed on the next day. I did allow Windows to update in between. Not sure if it's the cause. I tried including headers:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
print(requests.get(url, headers=headers).text)

But this is what i get:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='rule34video.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000025316C12430>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

Then I tried using selenium as my last resort, however, the results were the same, it can't access the website at all.

This is what I see on the loaded html page.

502 Bad Gateway

ProtocolException('Server connection to (\'rule34video.com\', 443) failed: Error connecting to "rule34video.com": [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')

I am almost certain that my ip address is blacklisted, however when I use google chrome to visit https://rule34video.com/, it loaded with no problem at all.

My question is:

How does google chrome not get blocked
What can I do to bypass the scraping protection

Solution

Websites have different ways to detect scrapers and bots.

After searching about it I can pass these protections using the undetected mode from seleniumbase framework.

https://seleniumbase.io/help_docs/uc_mode/