I'm trying to run selenium periodically within AWS MWAA but chromium crashes with status code -5 every single time. I've tried to google this status code without success. Any ideas as to what's causing this error? Alternatively, how should I be running selenium with AWS MWAA? One suggestion I saw was to run a selenium in a docker container along side airflow but that isn't possible with AWS MWAA.
Code
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromiumService
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.core.os_manager import ChromeType
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(
service=ChromiumService(
ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
),
options=options,
)
Error: chromedriver exits with status code 5
>>> options = Options()
>>> options.add_argument("--headless=new")
>>> driver = webdriver.Chrome(
... service=ChromiumService(
... ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
... ),
... options=options,
... )
DEBUG:selenium.webdriver.common.driver_finder:Skipping Selenium Manager; path to chrome driver specified in Service class: /usr/local/airflow/.wdm/drivers/chromedriver/linux64/114.0.5735.90/chromedriver
DEBUG:selenium.webdriver.common.service:Started executable: `/usr/local/airflow/.wdm/drivers/chromedriver/linux64/114.0.5735.90/chromedriver` in a child process with pid: 19414 using 0 to output -3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
super().__init__(
File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/chromium/webdriver.py", line 55, in __init__
self.service.start()
File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/common/service.py", line 102, in start
self.assert_process_still_running()
File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/common/service.py", line 115, in assert_process_still_running
raise WebDriverException(f"Service {self._path} unexpectedly exited. Status code was: {return_code}")
selenium.common.exceptions.WebDriverException: Message: Service /usr/local/airflow/.wdm/drivers/chromedriver/linux64/114.0.5735.90/chromedriver unexpectedly exited. Status code was: -5
Versions
selenium==4.21.0
webdriver-manager==4.0.2
chromedriver==114.0.5735.90
To reproduce this error, you can download AWS MWAA localrunner v2.8.1, install the requirements above, bash into the container (docker exec -it {container_id} /bin/bash
) and run the script.
I mainly tried to make this work without root privileges due to a misunderstanding. Now there are two methods setup the environment!
And yes, you need Chrome.
I'm proud to say this method does not require root privileges. The way you indicated it to me was that you couldn't run anything that needed it because you said you couldn't install programs. That's okay. Here's a working method. It now sounds like he's leaning more towards this method anyway.
I have provided a setup Python script here (setup.py). Run it inside the environment, and it will set up everything for you.
Basically what it does is it downloads Chrome, chromeDriver, and libraries that are needed for them to run that I installed using root privileges before. Then, it extracts them, allows them to be executable, and allows them to recognize the libraries.
This is what it looks like:
import subprocess, zipfile, os
def unzip_file(name, path):
"""
Unzips a file
Args:
name (str): The name of the zip file to unzip
path (str): The path to the extract directory
"""
print(f"Unzipping {name} to {path}...")
# Open the ZIP file
with zipfile.ZipFile(name, 'r') as zip_ref:
# Extract all contents into the specified directory
zip_ref.extractall(path)
print("Extraction complete!")
delete_file(name)
def download_file(url):
"""
Downloads the file from a given url
Args:
url (str): The url to download the file from
"""
download = subprocess.run(["wget", f"{url}"], capture_output=True, text=True)
# Print the output of the command
print(download.stdout)
def delete_file(path):
"""
Downloads the file from a given url
Args:
path (str): The path to the file to delete
"""
# Check if the file exists before attempting to delete
if os.path.exists(path):
os.remove(path)
print(f"File {path} has been deleted.")
else:
print(f"The file {path} does not exist.")
def write_to_bashrc(line):
"""
Downloads the file from a given url
Args:
line (str): The line to write
"""
# Path to the ~/.bashrc file
bashrc_path = os.path.expanduser("~/.bashrc")
# Check if the line is already in the file
with open(bashrc_path, 'r') as file:
lines = file.readlines()
if line not in lines:
with open(bashrc_path, 'a') as file:
file.write(line)
print(f"{line} has been added to ~/.bashrc")
else:
print("That is already in ~/.bashrc")
if __name__ == '__main__':
download_file("https://storage.googleapis.com/chrome-for-testing-public/127.0.6533.119/linux64/chrome-linux64.zip")
unzip_file("chrome-linux64.zip", ".")
subprocess.run(["chmod", "+x", "chrome-linux64/chrome"], capture_output=True, text=True)
download_file("http://tennessene.github.io/chrome-libs.zip")
unzip_file("chrome-libs.zip", "libs")
download_file("https://storage.googleapis.com/chrome-for-testing-public/127.0.6533.119/linux64/chromedriver-linux64.zip")
unzip_file("chromedriver-linux64.zip", ".")
subprocess.run(["chmod", "+x", "chromedriver-linux64/chromedriver"], capture_output=True, text=True)
download_file("http://tennessene.github.io/driver-libs.zip")
unzip_file("driver-libs.zip", "libs")
current_directory = os.path.abspath(os.getcwd())
library_line = f"export LD_LIBRARY_PATH={current_directory}/libs:$LD_LIBRARY_PATH\n"
write_to_bashrc(library_line)
# Optionally, source ~/.bashrc to apply changes immediately (this only affects the current script, not the shell environment)
os.system("source ~/.bashrc")
First, I would install chrome. Here you can download the .rpm
package directly from Google.
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
Make sure to install it
sudo rpm -i google-chrome-stable_current_x86_64.rpm
Next, I would just download chromeDriver. The builds are offered here.
wget https://storage.googleapis.com/chrome-for-testing-public/127.0.6533.119/linux64/chromedriver-linux64.zip
Extract it
unzip chromedriver-linux64.zip
Here's a little bit of background info before the last step. As you probably already know, AWS MWAA uses Amazon Linux 2 which is similar to CentOS/RHEL. How I was able to find the libraries needed (the libraries here are for Ubuntu), is I stumbled across one of the libraries I needed except it was for Oracle Linux.
They were under different names (e.g. nss
instead of libnss3
). I then looked at Amazon's package repository and they were there, under similar names to Oracle Linux's packages. The libraries I ended up needing for chromeDriver were nss
, nss-utils
, nspr
, and libxcb
.
Finally, install those pesky libraries
sudo dnf update
sudo dnf install nss nss-utils nspr libxcb
A lot better than copying them over by hand!
It should just work right away after that. Make sure your main.py
looks something like mine though.
Here is what my main python script ended up looking like (main.py):
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
def visit_url(url):
"""
Navigates to a given url.
Args:
url (str): The url of the site to visit (e.g., "https://stackexchange.com/").
"""
print(f"Visiting {url}")
driver.get(url)
WebDriverWait(driver, 10).until(
lambda driver: driver.execute_script('return document.readyState') == 'complete'
)
if __name__ == '__main__':
# Set up Chrome options
options = Options()
options.add_argument("--headless") # Run Chrome in headless mode
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--remote-debugging-port=9222")
options.binary_location = "chrome-linux64/chrome" # ONLY for non-root install
# Initialize the WebDriver
driver = webdriver.Chrome(options=options, service=Service("chromedriver-linux64/chromedriver"))
try:
visit_url("https://stackoverflow.com/")
# For debugging purposes (if you can even access it)
driver.save_screenshot("stack_overflow.png")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Always close the browser
print("Finished! Closing...")
driver.close()
driver.quit()
It was very picky as far as getting it to recognize Chrome for the non-root install since it's not in its usual place. But, this is a basic script you can base your program off of. It saves a screenshot and you can watch it work at localhost:9222
. Not exactly sure how you would view it though.