I'm trying to scrape different websites using puppeteer. As I'm using puppeteer-extra for that (for their stealth-plugin), I've decided to use their anonymize-ua plugin to randomly change the default user-agent to further reduce detection.
I tried following their explanation, but when I'm logging the browser's actual user-agent it seems to didn't take affect.
Attached below is an example for what I'm doing:
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';
const scrape = async (url: string) => {
// Set stealth plugin
const stealthPlugin = StealthPlugin();
puppeteer.use(stealthPlugin);
// Create random user-agent to be set through plugin
const userAgent = new UserAgent({ platform: 'MacIntel', deviceCategory: 'desktop' });
const userAgentStr = userAgent.toString();
console.log(`User Agent: ${userAgentStr}`);
const anonymizeUserAgentPlugin = require('puppeteer-extra-plugin-anonymize-ua')({
customFn: () => userAgentStr
});
puppeteer.use(anonymizeUserAgentPlugin);
puppeteer
.launch({ headless: false })
.then(async (browser) => {
// Different from the one above
console.log(`User Agent: ${await browser.userAgent()}`);
})
.catch((e) => console.log(e));
}
Although the first user-agent string is randomized (from run to run) through user-agents library, the other one logged when creating the browser is the actual running Chromium version.
Am I missing some configuration? or shouldn't I be looking at the browser user-agent like that?
After some digging inside puppeteer-extra, and the anonymize-ua plugin code, I've found out:
page
instance, so trying to look at the one coming from the browser
will not result in the actual one used. The right way is to log navigator.useragent
through the devtools console.onPageCreated
) to be able to modify the page instance (e.g. user-agent) before the browser request occurs. It seems that they tried to workaround it by first goto
about:blank. This workaround did not solve it for me, because the user-agent was not changed.So my solution was to duplicate the code from the plugin and set the generated user-agent on the page
:
puppeteer
.launch({ headless: false })
.then(async (browser) => {
browser
.pages()
.then(async ([page]) => {
await page.setUserAgent(userAgentStr);
})
.catch(async (e) => {
console.log(e);
await browser.close();
});
})
.catch((e) => console.log(e));
Hopes this helps anyone!