Search code examples
typescriptpuppeteerchromiumuser-agent

Changing user-agent on puppeteer-extra doesn't seem to take affect


I'm trying to scrape different websites using puppeteer. As I'm using puppeteer-extra for that (for their stealth-plugin), I've decided to use their anonymize-ua plugin to randomly change the default user-agent to further reduce detection.

I tried following their explanation, but when I'm logging the browser's actual user-agent it seems to didn't take affect.

Attached below is an example for what I'm doing:

import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';

const scrape = async (url: string) => {
    // Set stealth plugin
    const stealthPlugin = StealthPlugin();
    puppeteer.use(stealthPlugin);

    // Create random user-agent to be set through plugin
    const userAgent = new UserAgent({ platform: 'MacIntel', deviceCategory: 'desktop' });
    const userAgentStr = userAgent.toString();
    console.log(`User Agent: ${userAgentStr}`);

    const anonymizeUserAgentPlugin = require('puppeteer-extra-plugin-anonymize-ua')({
        customFn: () => userAgentStr 
    });
    puppeteer.use(anonymizeUserAgentPlugin);

    puppeteer
        .launch({ headless: false })
        .then(async (browser) => {
            // Different from the one above
            console.log(`User Agent: ${await browser.userAgent()}`);
        })
        .catch((e) => console.log(e));
}

Although the first user-agent string is randomized (from run to run) through user-agents library, the other one logged when creating the browser is the actual running Chromium version.

Am I missing some configuration? or shouldn't I be looking at the browser user-agent like that?


Solution

  • After some digging inside puppeteer-extra, and the anonymize-ua plugin code, I've found out:

    1. The user-agent is changed on the page instance, so trying to look at the one coming from the browser will not result in the actual one used. The right way is to log navigator.useragent through the devtools console.
    2. There's an open issue on puppeteer that events are not triggered early enough for listeners (e.g. plugins using onPageCreated) to be able to modify the page instance (e.g. user-agent) before the browser request occurs. It seems that they tried to workaround it by first goto about:blank. This workaround did not solve it for me, because the user-agent was not changed.

    So my solution was to duplicate the code from the plugin and set the generated user-agent on the page:

    puppeteer
        .launch({ headless: false })
        .then(async (browser) => {
            browser
                .pages()
                .then(async ([page]) => {
                    await page.setUserAgent(userAgentStr);
                })
                .catch(async (e) => {
                    console.log(e);
                    await browser.close();
                });
        })
        .catch((e) => console.log(e));
    

    Hopes this helps anyone!