Search code examples
pythonselenium-webdriverdigital-ocean

Why my scraping with Selenium is not working on Digital Ocean droplet?


I'm working with Digital Ocean and droplets for the first time, and I can't get my Selenium script to work. At first I was getting the DevToolsActivePort file doesn't exist error, however, now my script is just not returning anything. It's not actually finishing at all. I tried adding ports and specifying locations of the chromium-browser. And nothing seems to be working.

This is my code:

options = Options()
options.add_argument("start-maximized")
options.add_argument('--headless')
options.binary_location = "/usr/bin/chromium-browser"
options.add_argument('--user-data-dir=/home/username/myproject')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Chrome(options=options)

base_url = 'https://www.wikipedia.org/'
driver.get(base_url)

table_rows = driver.find_element(By.CSS_SELECTOR, ".footer-sidebar-text")
text = table_rows.text
print(text)
driver.quit()

For context, if it helps, the code works locally with just this:

options = Options()
driver = webdriver.Chrome(options=options)
driver.maximize_window()

What do I need to do to fix this? Thank you!

EDIT: Just to add a note for anyone having the same issue. Follow Barry's code below. But before that, make sure your droplet has enough memory, so Chrome can be installed properly. I had to resize mine to 1GB memory and that solved the issues and errors.


Solution

  • First, make sure Chrome or Chromium is installed, if not install it with apt, or download/wget it and install it.

    Then do:

    options = Options()
    options.add_argument("start-maximized")
    options.add_argument('--headless=new')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-gpu')
    options.add_argument('--no-sandbox')
    
    options.binary_location = "/usr/bin/chromium-browser" ### you should be able to remove this, Selenium should use the default location
    options.add_argument('--user-data-dir=/home/username/myproject') ## this, I'm not sure about, I cannot debug it
    options.add_argument("--remote-debugging-port=9222")
    driver = webdriver.Chrome(options=options)
    

    Selenium documentation can be found here. EDIT: To install a non-snap version of chrome:

    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb 
    

    Then

    apt install ./google-chrome-stable_current_amd64.deb