So I have seen the puppeteer-cluster package but that has very manual examples my situation is very dynamic so i'll try my best to explain.
Ok So I have an app in which users schedule posts. Once the time of posting arrives puppeteer runs, goes to the site, logs in the user using creds from my app's db, and posts the content fairly simple.
Now the problem arises when says 20 users all decided to post today at 1pm. Now puppeteer spawns 25 chromium instances which messess with the server cause of limited RAM. What I am asking basically is how can achieve the following: 1). Limit puppeteer's concurrency to 10 instances. Any more then that then it should basically do it in batches like do 10 first then close them and start 10 again etc. 2). If less then 10 then just keep normal functionality.
I know this seems like I m giving you homework but trust me i just need some guidance a little help or pointing me in the right direction would suffice. or if you could tell me how to use this: puppeteer-cluster dynamically to suit my needs. Many thanks!
Code snippet:
const { Cluster } = require('puppeteer-cluster');
const runChunks = async (chunkArr, chunkSize) => {
//Launching cluster for each chunk
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: chunkSize, //Defined max chunksize
});
//Task to complete
await cluster.task(async ({ page, data: url }) => {
await page.goto(url);
console.log('Reached: ', url);
// Here goes the code for scraping task to complete ...
});
//Chunked array URLs queued for task completion
chunkArr.forEach(data => {
cluster.queue(data.url);
});
//Closing the cluster instance after it becomes idle
await cluster.idle();
await cluster.close();
};
function chunkArrGenerator(arr, chunkSize) {
let chunksArr = [];
let indexCounter = 0;
while (indexCounter <= (arr.length - 1)) {
chunksArr.push(arr.splice(0, chunkSize));
indexCounter++;
}
return chunksArr;
}
// assume request array having 100 objects with url data
let arr = [{ url: "https://www.amazon.in/" }, { url: "https://www.flipkart.com/" }, { url: "https://www.crateandbarrel.com/" }, { url: "https://www.cb2.com/" } /* so on ... */];
let size = 2; //chunk size
//Following line creates chunks of size 2, you change it to 10 as per your need
let chunks = chunkArrGenerator(arr, size);
//Executing each cluster on each chunk
chunks.forEach(async (chunk) => {
await runChunks(chunk, size);
});