I am working on a webscraper with Nextjs and Puppeteer. Everything works well on localhost, but once I deploy, the Vercel deployed version gives a 500 internal server error when I try to access puppeteer. I've looked at some guides on deploying a serverless puppeteer function to Vercel, and some suggested playwright, but it still doesn't work when I deploy it. Here are the code snippets using puppeteer, and here is the github repo: https://github.com/hellolol2016/EquilibriNews
import chromium from "chrome-aws-lambda";
import playwright from "playwright-core";
//FUNCTION TO RUN SEPARATE SCRAPE FUNCTIONS
async function scrapeInfiniteScrollItems(page, getNews, src) {
let items = {};
try {
items = await page.evaluate(getNews);
} catch (e) {
console.log(e);
console.log("bad source", src);
}
return items;
}
//FUNCTION TO SET UP BROWSER AND RETURN
export default async function handler(req, res) {
const browser = await playwright.chromium.launch({
args: chromium.args,
executablePath:
process.env.NODE_ENV !== "development"
? await chromium.executablePath
: "/usr/bin/chromium",
headless: process.env.NODE_ENV !== "development" ? chromium.headless : true,
});
const page = await browser.newPage();
page.setJavaScriptEnabled(false);
page.setViewport({ width: 1280, height: 3000 });
await page.goto("https://www.foxnews.com/politics");
let items = await scrapeInfiniteScrollItems(page, extractFox, "fox");
//NOTE: I didn't include the extractFox function because it didnt use any puppeteer functions
allArticles.fox = items;
await browser.close();
res.status(200).json(allArticles);
}
I've tried some other articles about this like https://puppeteer-screenshot-demo.vercel.app/?page=https://whitep4nth3r.com (This one uses a deprecated version of Node) and https://ndo.dev/posts/link-screenshot (this is what I'm trying right now).
I'm guessing the solution is to install a different library that works in a similar way as playwright / puppeteer / chrome-aws-lambda but can still be used when deployed as a serverless function on Vercel.
I followed this article and got it running on Vercel:
https://www.stefanjudis.com/blog/how-to-use-headless-chrome-in-serverless-functions/
I believe your issue is that Chromium is too large to run in a serverless function (50mb limit).
If you make these changes and it still doesn't work, check your deployment logs to see if the serverless function is hitting the 10sec execution time limit.