Search code examples
node.jsweb-scrapingproxypuppeteerchromium

How to use puppeteer with browserless and proxy


I can't figure out how to use puppeteer with browserless and proxy. I keep getting proxy connection errors.

I run browserless in docker like so:

docker run -p 3000:3000 -e "MAX_CONCURRENT_SESSIONS=5" -e "MAX_QUEUE_LENGTH=0" -e "PREBOOT_CHROME=true" -e "CONNECTION_TIMEOUT=300000" --restart always browserless/chrome

Puppeteer options in config I tried to connect with:

const args = [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-infobars',
    '--window-position=0,0',
    '--ignore-certifcate-errors',
    '--window-size=1400,900',
    '--ignore-certifcate-errors-spki-list',
];

const options = {
    args,
    headless: true,
    ignoreHTTPSErrors: true,
    defaultViewport: null,
    browserWSEndpoint: `ws://localhost:3000?--proxy-server=socks5://127.0.0.1:9055`,
}

How I connect:

const browser = await puppeteer.connect(config.options);
const page = await browser.newPage();
await page.goto('http://example.com', { waitUntil: 'networkidle0' }

Error I get:

Error: net::ERR_PROXY_CONNECTION_FAILED at http://example.com
    at navigate (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:115:23)
    at processTicksAndRejections (internal/process/task_queues.js:94:5)
    at async FrameManager.navigateFrame (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:90:21)
    at async Frame.goto (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:417:16)
    at async Page.goto (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:825:16)

The proxy I'm using in example above is TOR browser, that runs in the background. I can connect through it when I'm not using browserless and use puppeteer.launch() function. I put this proxy in args and everything works fine, the requests are going through tor proxy. I can't figure out know why it doesn't work with browserless and websockets though.

Of course I tried different proxies. I created local proxy in node similar to that How to create a simple http proxy in node.js? (the proxy-server option is then --proxy-server=http://127.0.0.1:3001), but the error is the same and I can't even see incoming requests in server's terminal, it looks like they don't even reach a proxy.

I tried public proxies addresses, same error.

Chaninng website I'm trying to connect to in page.goto() function doesn't change anything, still get the same error.

I'm beginner at web scraping and run out of options here. Any idea would be helpful.


Solution

  • Ok, it looks like some docker issue. Apparently, there are problems when I'm trying to connect from from browserless inside container to tor which is on host. I used host.docker.internal instead of localhost in connection string and it worked.