I have been trying to run a Selenium script in a Docker container. I am getting the following error while trying to run the script:
Traceback (most recent call last):
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/django/core/handlers/exception.py", line 55, in inner
2023-12-12 15:44:32 response = get_response(request)
2023-12-12 15:44:32 ^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/django/core/handlers/base.py", line 197, in _get_response
2023-12-12 15:44:32 response = wrapped_callback(request, *callback_args, **callback_kwargs)
2023-12-12 15:44:32 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32 File "/code/fyp_project/web_scraper/views.py", line 9, in user_interface
2023-12-12 15:44:32 scrape_reviews(product_url)
2023-12-12 15:44:32 File "/code/fyp_project/web_scraper/helpers.py", line 34, in scrape_reviews
2023-12-12 15:44:32 browser = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
2023-12-12 15:44:32 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
2023-12-12 15:44:32 super().__init__(
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 61, in __init__
2023-12-12 15:44:32 super().__init__(command_executor=executor, options=options)
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 209, in __init__
2023-12-12 15:44:32 self.start_session(capabilities)
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 293, in start_session
2023-12-12 15:44:32 response = self.execute(Command.NEW_SESSION, caps)["value"]
2023-12-12 15:44:32 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 348, in execute
2023-12-12 15:44:32 self.error_handler.check_response(response)
2023-12-12 15:44:32 File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
2023-12-12 15:44:32 raise exception_class(message, screen, stacktrace)
2023-12-12 15:44:32 selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
2023-12-12 15:44:32 (session not created: DevToolsActivePort file doesn't exist)
2023-12-12 15:44:32 (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
I have looked through various posts on Stack Overflow for this error, and I changed my Python code to the following:
def scrape_reviews(product_url):
chrome_path='/usr/bin/google-chrome'
filename='daraz_reviews.csv'
#open the browser
#service = Service(executable_path=path)
options = webdriver.ChromeOptions()
options.binary_location=chrome_path
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")
options.add_argument("--disable-infobars")
options.add_argument("--start-maximized")
options.add_argument("--disable-notifications")
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--disable-setuid-sandbox")
browser = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install())) #Error at this line
Even after adding all these options to my Python code, I am still getting this same error. I would appreciate if someone can help me troubleshoot this.
My Dockerfile is also given below:
# Use an official Python runtime as a parent image
FROM python:3.12
# Allows docker to cache installed dependencies between builds
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Install dependencies for Chrome and Chromedriver
RUN apt-get update && apt-get install -y \
wget \
unzip \
libnss3 \
&& rm -rf /var/lib/apt/lists/*
#Download Google Chrome dependencies
# Chrome dependency Instalation
RUN apt-get update && apt-get install -y \
fonts-liberation \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libatspi2.0-0 \
libcups2 \
libdbus-1-3 \
libdrm2 \
libgbm1 \
libgtk-3-0 \
libgtk-4-1 \
libnspr4 \
libnss3 \
libwayland-client0 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxkbcommon0 \
libxrandr2 \
xdg-utils \
libu2f-udev \
libvulkan1
#Download Google Chrome
RUN apt -f install -y
RUN apt-get install -y wget
RUN wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i ./google-chrome-stable_current_amd64.deb
#Display the path to google chrome
RUN which google-chrome
# Download and install Chromedriver
RUN wget -O /tmp/chromedriver.zip https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/120.0.6099.71/linux64/chromedriver-linux64.zip && \
unzip /tmp/chromedriver.zip -d /usr/local/bin/ && \
rm /tmp/chromedriver.zip
# Set the PATH environment variable
ENV PATH="/usr/local/bin:${PATH}"
# Mount the application code to the image
COPY . /code/fyp_project/
WORKDIR /code
EXPOSE 8000
# Run the production server
ENTRYPOINT ["python", "fyp_project/manage.py"]
CMD ["runserver", "0.0.0.0:8000"]
So, after being stuck on this issue for nearly a week, this is the code that finally allowed me to run the Selenium web scraper in Docker. Special emphasis on adding --headless
and --no-sandbox
as parameters. I also had to options
as a parameter in webDriver.Chrome()
def scrape_reviews(product_url):
path = '/usr/bin/chromedriver-linux64/chromedriver'
chrome_path='/usr/bin/google-chrome'
filename='daraz_reviews.csv'
#open the browser
service = Service(executable_path=path)
options = webdriver.ChromeOptions()
options.binary_location=chrome_path
options.add_argument("start-maximized") #open Browser in maximized mode
options.add_argument("disable-infobars") # disabling infobars
options.add_argument("--disable-extensions") # disabling extensions
options.add_argument("--disable-gpu") # applicable to windows os only
options.add_argument("--disable-dev-shm-usage"); # overcome limited resource problems
options.add_argument("--no-sandbox") # Bypass OS security model
options.add_argument("--headless")
browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options) #Added options parameter
I hope this helps someone out there.