I am using Puppeteer along a proxy service, and after getting unexplainable high bandwidth usage I used a local proxy server to monitor the requests that were generating this bandwidth. I discovered that almost 90% of the traffic was used to request some crx files/updates.
My project requires me to open a a few thousand browsers every hour, in order to keep each task with it's own cookies and proxy. Every Chromium browser I open will eventually download ~10-15MB of files, using the proxy that is passed as arg to puppeteer.launch.
puppeteer.launch({
headless: false,
args: [
`--proxy-server=http://${this.proxy.host}:${this.proxy.port}`
]
)}
This requests do not appear in the network section of devtools and cannot be intercepted using:
await page.setRequestInterception(true);
this.page.on("request", cb);
I started a local proxy server and gave it to puppeteer via launch args to use, in order to monitor the requests made through it by Chrome. This is how I found out about this downloads. I blocked the first domain that Chromium was using to download these crx files, but Chromium started to download them from another domain, and so on. Some of this domains and URLs are:
There were even more. When I block one domain, puppeteer finds another. This files are getting downloaded for every new browser launched, using expensive proxy bandwidth.
Is there a way to stop these downloads, or at least make Chromium only download them once? Not for every new browser launched. Can I at least instruct chrome to download these files without using the proxy?
This happens for both v5.5.0 and v8.0.0.
After a lot of time trying to find what is this extension that chrome always has to download, I found out about Chromium Components, that can be inspected using chrome://components
. Looks like these are also shipped as crx files.
In my particular case Chrome was downloading "pnacl". The only way I was able to find this is by recognising the version number from the first link that I posted in my question (0.57.44.2492). Using chrome://components
in a browser instance launched by puppeteer with the headless option to false, I found that pnacl had the exact same version.
I was able to prevent Chrome from downloading this component using the flag --disable-component-update. This flag is used by default by some webdrivers but not by the one that puppeteer (v5.5.0 or v8.0.0) downloads.
If anybody else encounters this problem, yours may be related to an extension instead of a component, so you may need to also use a flag to disable extension updates, but there is none, so I use --disable-extensions and --disable-default-apps just to make sure.