Node.js app with Express, deployed on Heroku. It's just dynamic webpages. Loading static webpages works fine.
Loading dynamic webpages works on localhost, but on Heroku it throws me code=H12
, desc="Request timeout"
, service=30000ms
, status=503
.
In addition, fresh after doing heroku restart
or making a deployment, there always seems to be one instance of a status=200
that loads only the static portion of a dynamic webpage.
Screenshot of logs here.
I've tried the following, which have all led to either the same or other unexpected results when deployed on Heroku (such as Error R14 (Memory quota exceeded)
and code=H13 desc="Connection closed without response"
):
headless: true
in Puppeteer's launch
arguments.--no-sandbox
, --disable-setuid-sandbox
, --single-process
, and --no-zygote
flags in args
of Puppeteer's launch
arguments. (Reference: this comment & this comment)waitUntil
argument in Puppeteer's goto
function to domcontentloaded
, networkidle0
and networkidle2
. (Reference: this comment)timeout
argument in Puppeteer goto
function; I've tried 30000
and 60000
specifically, as well as 0
per this comment.waitForSelector
function.url
variable (see my code below) in the console. Output is as expected.I've observed that:
try-catch-finally
block never catches any error. It's always one of the following: I get an incomplete result (static portion of requested dynamic webpage), or the app crashes (code=H13 desc="Connection closed without response"
). So I haven't been able to get anything out of attempting to print exception
in the console from within the catch
block.Any ideas on how I could get this to work?
const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
let browser;
...
app.listen(port, async() => {
browser = await puppeteer
.launch({
timeout: 0,
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--single-process",
"--no-zygote",
],
});
});
...
app.get("/appropriate-route-name", async (req, res) => {
let url = req.query.url;
let page = await browser.newPage();
try {
await page.goto(url, {
waitUntil: "networkidle2",
});
res.send({ data: await page.content() });
} catch (exception) {
res.send({ data: null });
} finally {
await browser.close();
}
}
Was able to get it to work by using user-agents
. Dynamic pages now load just fine on Heroku; requests don't time out every single time anymore.
const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
var userAgent = require("user-agents");
...
app.get("/route-name", async (req, res) => {
let url = req.query.url;
let browser = await puppeteer.launch({
args: ["--no-sandbox"],
});
let page = await browser.newPage();
try {
await page.setUserAgent(userAgent.toString()); // added this
await page.goto(url, {
timeout: 30000,
waitUntil: "newtorkidle2", // or "networkidle0", depending on what you need
});
res.send({ data: await page.content() });
} catch (e) {
res.send({ data: null });
} finally {
await browser.close();
}
});