Search code examples
node.jsherokupuppeteernode-moduleschromium

How to reduce Puppeteer size


I'm using Puppeteer for webscraping, with a small NodeJs webapp that I made. This webapp is hosted on Heroku and use jontewks/puppeteer-heroku-buildpack to works.

The problem I'm facing is that my app do not build anymore because of the Heroku size limit:

Compiled slug size: 537.4M is too large (max is 500M).

I've tried severals things:

  • Using Firefox instead of Chromium
  • Reducing the size of Chromium by removing the file interactive_ui_tests.exe
    • I can't do this because Heroku use Linux instead of Windows, and this file does not exist in the Linux Chromium distribution
  • Using headless_shell instead of Chromium
    • I'm stuck with this (like here) as I do not understand how to make it works. I found the file to use here, but I'm facing the same issue as the comment from the 07/09/2018
  • Using Playwright instead of Puppeteer
    • It might be a solution, but I'm using stuffs like puppeteer-extra and puppeteer-extra-plugin-stealth, so it bother me to change
  • Reducing the size of Chromium by removing the folder locales
    • It helps a bit, but not much
  • Using an older version of Puppeteer (2.1.1), which is using an older version Chromium who was slighlty lighter
    • At the moment, it's the only working solution that I have
  • Use the command heroku repo:gc -a myapp and heroku builds:cache:purge -a myapp

My last three points reduced the size of my slug to 490M. So my app is working, but it's not great for the (close) future, like having an up to date Puppeteer version.

So here I am, asking for help, as I do not have any more ideas at the moment.

Thank you very much for your help 🙏


Solution

  • Finally, I end up using Playwright.

    With this Buildpack, the build of my app is only 250Mb!

    Here's a few steps I've followed:

    • Install with NPM playwright-chromium to only download Chromium.

    • Set PLAYWRIGHT_BUILDPACK_BROWSERS env variable to chromium in Heroku to only install Chromium dependencies.

    • Put this buildpack before Node.js buildpack in Heroku.

    • With this trick you can use most of the of stuff from puppeteer-stealth.

    • If you want, you can block resources like in Puppeteer:

    await page.route('**/*', route => ([
        'stylesheet',
        'image',
        'media',
        'font',
        // 'script',
        'texttrack',
        'xhr',
        'fetch',
        'eventsource',
        'websocket',
        'manifest',
        'other',
    ].includes(route.request().resourceType()) ? route.abort() : route.continue()))