Search code examples
pythonheaderselenium-chromedrivercontactsuser-agent

Add contact information to user-agent using selenium chromedriver


I am completing a web scraping project and I would like to add my contact email to the user-agent so that I can be contacted if the website admin would like to contact me about the scraping I am doing or would like me to stop.

I have found the following documentation on user-agents:

headers = {
    "User-Agent": "my web scraping program. contact me at [email protected]"
}
r = requests.get("http://example.com", headers=headers)

This example involves requests instead of chromedriver. I am wondering if anyone knows how I can add this type of header information to my user-agent while using selenium/chromedriver. This is my code so far:

from selenium import webdriver
import os
import re
import time
from webdriver_manager.chrome import ChromeDriverManager

chromedriver_path = os.getcwd() + '/chromedriver'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('www.example.com')
link_url = driver.find_element_by_tag_name('a')
time.sleep(10)
html = driver.page_source
driver.close()

I am unsure how and where I can define my header with my contact information. Any ideas? Thanks!


Solution

  • Selenium actually doesn't directly support request headers as it simply does not have an API for it.

    Your only viable option if you REALLY have to send headers with selenium is to use browsermob-proxy: https://github.com/lightbody/browsermob-proxy

    This is what that would look like:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    
    chrome_options = Options()
    chrome_options.add_argument('-start maximized')
    
    
    headers = {'User-Agent': 'webscraper - email'}
    
    server = Server(path='path to browsermob-proxy', options=headers)
    server.start()
    proxy = server.create_proxy().proxy
    chrome_options.add_argument(f'--proxy-server{proxy}')
    
    driver = webdriver.Chrome(options=chrome_options)
    
    driver.get('your URL')
    

    The code below simply makes it so that the selenium browser window is opened. Alternatively you could put '--headless' so that the browser window does not open when you run the script.

    chrome_options.add_argument('-start maximized')
    

    Similarly the code below lets the chrome webdriver know that we want to use the proxy server we just created. This redirection through the proxy server is what lets you add headers to your connection.

    chrome_options.add_argument(f'--proxy-server{proxy}')