Search code examples
pythonparsingselenium-webdriverweb-scraping

Selenium emulate site you came from


When I am opening the url via link https://kinoxor.pro/650-mir-druzhba-zhvachka-2024-05-06-19-54.html -- I have an error - Internal Server Error

enter image description here

But when I paste the link to the search engine https://yandex.ru/search/?text=https%3A%2F%2Fkinoxor.pro%2F650-mir-druzhba-zhvachka-2024-05-06-19-54.html&search_source=dzen_desktop_safe&lr=213

I can open the site (the first site in the results) enter image description here

I am opening site with help of selenium

from selenium import webdriver 
   

driver = webdriver.Chrome()    
url = "https://kinoxor.pro/650-mir-druzhba-zhvachka-2024-05-06-19-54.html"   
driver.get(url) 

I think that I should emulate the site I came from (add some headers / meta-information) I mean I should emulate that my previous site was yandex.ru (I don't want to really go to yandex)

Is it possible in Python?


Solution

  • I think you can try selenium wire instead.

    https://pypi.org/project/selenium-wire/#intercepting-requests-and-responses

    Install it using pip install selenium-wire

    You could try something like this,

    from seleniumwire import webdriver 
    
    driver = webdriver.Chrome()
    
    def interceptor(request):
        del request.headers['Referer'] 
        request.headers['Referer'] = 'https://yandex.ru/'
    
    driver.request_interceptor = interceptor
    
    url = "https://kinoxor.pro/650-mir-druzhba-zhvachka-2024-05-06-19-54.html"   
    
    driver.get(url)
    

    If you just want to use selenium, you can try this,

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    
    chrome_options = Options()
    referer = "https://www.yandex.ru"
    chrome_options.add_argument(f"--referer={referer}")
    
    driver = webdriver.Chrome(options=chrome_options)
    
    url = "https://kinoxor.pro/650-mir-druzhba-zhvachka-2024-05-06-19-54.html"
    driver.get(url)
    
    

    However, it might not work as I don't think selenium has built in referrer spoofing.