Search code examples
node.jsdockerdockerfilepuppeteerdocker-container

How to run Puppeteer and Node.js inside a Docker Container?


I have build a scraper using Puppeteer and Node.js and now i want to dockerize it. I've tried multiple ways to tackle this, but encountering issue when puppeteer tries to start the browser for scraping.

My current basic Dockerfile without Puppeteer or any other dependencies: I've tried multiple ways to update this Dockerfile in every sense (adding chrome, puppeteer) but doesn't work

# Use Node.js runtime as the base image
FROM node:18

# Set the working directory in the container
WORKDIR /usr/src/app

# Copy package.json and package-lock.json to the working directory
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the port the app runs on
EXPOSE 8080

# Command to run the application
CMD ["node", "scraper.js"]

Code : Snippet which triggers/launches the browser

// Launch browser
const browser = await launch({ headless: true, defaultViewport: null });

Can someone help me here how can i tackle this to work ideally ?

Tried every possible way from here, here and here

Encountered Error :

An error occurred during scraping:

Error: Failed to launch the browser process!
web-crawler-1  | rosetta error: failed to open elf at /lib64/ld-linux-x86-64.so.2
web-crawler-1  |  
web-crawler-1  | 
web-crawler-1  | 
web-crawler-1  | TROUBLESHOOTING: https://pptr.dev/troubleshooting
web-crawler-1  | 
web-crawler-1  |     at Interface.onClose (file:///usr/src/app/node_modules/@puppeteer/browsers/lib/esm/launch.js:301:24)
web-crawler-1  |     at Interface.emit (node:events:529:35)
web-crawler-1  |     at Interface.close (node:internal/readline/interface:534:10)
web-crawler-1  |     at Socket.onend (node:internal/readline/interface:260:10)
web-crawler-1  |     at Socket.emit (node:events:529:35)
web-crawler-1  |     at endReadableNT (node:internal/streams/readable:1400:12)
web-crawler-1  |     at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Solution

  • This solution worked for me.

    To run Puppeteer inside a Docker container you should install Google Chrome manually because, in contrast to the Chromium package offered by Debian, Chrome only offers the latest stable version.

    Install browser on Dockerfile :

    FROM node:18
    
    # We don't need the standalone Chromium
    ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
    
    # Install Google Chrome Stable and fonts
    # Note: this installs the necessary libs to make the browser work with Puppeteer.
    RUN apt-get update && apt-get install curl gnupg -y \
      && curl --location --silent https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
      && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
      && apt-get update \
      && apt-get install google-chrome-stable -y --no-install-recommends \
      && rm -rf /var/lib/apt/lists/*
    
    # Install your app here...
    

    Additionally, If you are in an ARM-based CPU (Apple M1) like me, you should use the --platform linux/amd64 argument when you build the Docker image.

    Build Command : docker build --platform linux/amd64 -t <image-name> .

    Note : After updating your Dockerfile, make sure to update the puppeteer script, while launching the puppeteer browser add executable path with the path to chrome we recently installed on the machine.

    const browser = await launch({
       headless: true,
       defaultViewport: null,
       executablePath: '/usr/bin/google-chrome',
       args: ['--no-sandbox'],
    });