Search code examples
authenticationpuppeteerapify

How to access pages with basic authentication (Apify SDK)


In the puppeteer documentation i found that i could use await page.authenticate({ username: 'test', password: 'test' }); To access pages with basic authentication.

But it seems that the handlePageFunction has already done the request.

So how could i do that?

Apify.main(async () => {

const requestQueue = await Apify.openRequestQueue(`PC_${settings.project}_${time}`);
await requestQueue.addRequest({ url: settings.baseUrl });

const crawler = new Apify.PuppeteerCrawler({
    requestQueue,
    launchPuppeteerOptions: {
        headless: settings.headless,
        // slowMo: 500,
    },
    maxRequestsPerCrawl: settings.maxurls,
    maxConcurrency: settings.maxcrawlers,
    handlePageFunction: async ({ request, response, page }) => {
        await page.authenticate({ username: 'test', password: 'test' });
        await page.waitFor(settings.waitForPageload);

        const requestUrl = request.url
        const loadUrl = request.loadedUrl
        let isRedirected = false

        if (requestUrl !== loadUrl) {
            isRedirected = { from: requestUrl, to: loadUrl }
        }

Solution

  • You can manipulate the page before it is opened with gotoFunction.

    If you would need to login to a website, you can check this small login example

    const crawler = new Apify.PuppeteerCrawler({
        gotoFunction: async ({ page, request }) => {
            await page.authenticate({ username: 'test', password: 'test' });
            return page.goto(request.url, { timeout: 120000 });
        },