Search code examples
node.jsmultithreadingpuppeteerwebautomationnode-worker-threads

how to go multithreaded for puppeteer using worker-threads for web-automation purpose


hello so am doing some web automation and I want to open run puppeteer multithreaded what I mean like open the same page 10s of times and what I understood of what I read the worker thread is the best solution I guess? but I didn't get how to use it properly and I will put a sample code of what I did

 const { Worker, isMainThread } = require('worker_threads');
    
    const puppeteer = require('puppeteer') ; 
        let scrapt = async()=>{
        
           
               
                        /* -------------------------------------------------------------------------- */
                        /*                             Launching puppeteer                            */
                        /* -------------------------------------------------------------------------- */
            try{                        
              const browser = await puppeteer.launch({headless: true }) ; 
        
            const page = await browser.newPage();
            await page.setUserAgent(
              `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36`
            );
            let Browser_b = new Date()
              await page.goto('https://www.supremenewyork.com/')
            let browser_e = new Date()
            console.log(browser_e - Browser_b)
    }
    catch(e){
        console.log(e)
    }
let ex = [1,2,3,4]
if (isMainThread) {
    // This re-loads the current file inside a Worker instance.asdasd
    new Worker(__filename);
  } else {
    for(let val of ex) {
      scrapt();

    }
  }

this script opens 4 browsers but if I open more the pc lag ALOT since I think it's only using one thread not using them all? Thank u in advance and sorry for my stupidity


Solution

  • ever tried using Cluster? it's a good way for multi_processing and easier to use than worker_threads in my opinion here is an example from HERE

    const cluster = require('cluster');
    const http = require('http');
    const numCPUs = require('os').cpus().length;
    
    if (cluster.isMaster) {
      console.log(`Master ${process.pid} is running`);
    
      // Fork workers.
      for (let i = 0; i < numCPUs; i++) {
        cluster.fork();
      }
    
      cluster.on('exit', (worker, code, signal) => {
        console.log(`worker ${worker.process.pid} died`);
      });
    } else {
      // Workers can share any TCP connection
      // In this case it is an HTTP server
      http.createServer((req, res) => {
        res.writeHead(200);
        res.end('hello world\n');
      }).listen(8000);
    
      console.log(`Worker ${process.pid} started`);
    }