Search code examples
pythondockerwindows-10

Unable to run a Selenium script in a Docker container


I have been trying to run a Selenium script in a Docker container. I am getting the following error while trying to run the script:

Traceback (most recent call last):
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/django/core/handlers/exception.py", line 55, in inner
2023-12-12 15:44:32     response = get_response(request)
2023-12-12 15:44:32                ^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/django/core/handlers/base.py", line 197, in _get_response
2023-12-12 15:44:32     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2023-12-12 15:44:32                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32   File "/code/fyp_project/web_scraper/views.py", line 9, in user_interface
2023-12-12 15:44:32     scrape_reviews(product_url)
2023-12-12 15:44:32   File "/code/fyp_project/web_scraper/helpers.py", line 34, in scrape_reviews
2023-12-12 15:44:32     browser = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
2023-12-12 15:44:32               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
2023-12-12 15:44:32     super().__init__(
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 61, in __init__
2023-12-12 15:44:32     super().__init__(command_executor=executor, options=options)
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 209, in __init__
2023-12-12 15:44:32     self.start_session(capabilities)
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 293, in start_session
2023-12-12 15:44:32     response = self.execute(Command.NEW_SESSION, caps)["value"]
2023-12-12 15:44:32                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 348, in execute
2023-12-12 15:44:32     self.error_handler.check_response(response)
2023-12-12 15:44:32   File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
2023-12-12 15:44:32     raise exception_class(message, screen, stacktrace)
2023-12-12 15:44:32 selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
2023-12-12 15:44:32   (session not created: DevToolsActivePort file doesn't exist)
2023-12-12 15:44:32   (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

I have looked through various posts on Stack Overflow for this error, and I changed my Python code to the following:

def scrape_reviews(product_url):
    chrome_path='/usr/bin/google-chrome'
    filename='daraz_reviews.csv'
    #open the browser
    #service = Service(executable_path=path)
    options = webdriver.ChromeOptions()
    options.binary_location=chrome_path
    options.add_argument("--disable-gpu")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-infobars")
    options.add_argument("--start-maximized")
    options.add_argument("--disable-notifications")
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument("--disable-setuid-sandbox") 
    browser = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install())) #Error at this line

Even after adding all these options to my Python code, I am still getting this same error. I would appreciate if someone can help me troubleshoot this.

My Dockerfile is also given below:

# Use an official Python runtime as a parent image
FROM python:3.12

# Allows docker to cache installed dependencies between builds
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Install dependencies for Chrome and Chromedriver
RUN apt-get update && apt-get install -y \
    wget \
    unzip \
    libnss3 \
    && rm -rf /var/lib/apt/lists/*

#Download Google Chrome dependencies
# Chrome dependency Instalation
RUN apt-get update && apt-get install -y \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libatspi2.0-0 \
    libcups2 \
    libdbus-1-3 \
    libdrm2 \
    libgbm1 \
    libgtk-3-0 \
    libgtk-4-1 \
    libnspr4 \
    libnss3 \
    libwayland-client0 \
    libxcomposite1 \
    libxdamage1 \
    libxfixes3 \
    libxkbcommon0 \
    libxrandr2 \
    xdg-utils \
    libu2f-udev \
    libvulkan1

#Download Google Chrome
RUN apt -f install -y
RUN apt-get install -y wget
RUN wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i ./google-chrome-stable_current_amd64.deb 

#Display the path to google chrome
RUN which google-chrome

# Download and install Chromedriver
RUN wget -O /tmp/chromedriver.zip https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/120.0.6099.71/linux64/chromedriver-linux64.zip && \
   unzip /tmp/chromedriver.zip -d /usr/local/bin/ && \
   rm /tmp/chromedriver.zip


# Set the PATH environment variable
ENV PATH="/usr/local/bin:${PATH}"

# Mount the application code to the image
COPY . /code/fyp_project/
WORKDIR /code

EXPOSE 8000

# Run the production server
ENTRYPOINT ["python", "fyp_project/manage.py"]
CMD ["runserver", "0.0.0.0:8000"]

Solution

  • So, after being stuck on this issue for nearly a week, this is the code that finally allowed me to run the Selenium web scraper in Docker. Special emphasis on adding --headless and --no-sandbox as parameters. I also had to options as a parameter in webDriver.Chrome()

    def scrape_reviews(product_url):
        path = '/usr/bin/chromedriver-linux64/chromedriver'
        chrome_path='/usr/bin/google-chrome'
        filename='daraz_reviews.csv'
        #open the browser
        service = Service(executable_path=path)
        options = webdriver.ChromeOptions()
        options.binary_location=chrome_path
        options.add_argument("start-maximized") #open Browser in maximized mode
        options.add_argument("disable-infobars") # disabling infobars
        options.add_argument("--disable-extensions") # disabling extensions
        options.add_argument("--disable-gpu") # applicable to windows os only
        options.add_argument("--disable-dev-shm-usage"); # overcome limited resource problems
        options.add_argument("--no-sandbox") # Bypass OS security model
        options.add_argument("--headless")
        browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options) #Added options parameter
    

    I hope this helps someone out there.