Sorry for my naive question, I’m very new with Nodejs.
I’m building a polling
that will handle many tasks at the same time. And each task might take 10 -> 15 seconds to finish.
This is my Poller
class:
class Poller extends EventEmitter {
constructor(timeout) {
super();
this.timeout = timeout;
}
poll() {
setTimeout(() => this.emit("poll"), this.timeout);
}
onPoll(fn) {
this.on("poll", fn); // listen action "poll", and run function "fn"
}
}
And this is my current code inside each poll
:
let poller = new Poller(3000); // 3 seconds
poller.onPoll(() => {
// handle many tasks at the same time
for (let task of tasks) {
// handleTask function will take 15 seconds
// query database, make http request...
handleTask(task);
}
poller.poll();
})
If the tasks increase, like 100 tasks, Should I handle 100 tasks at the same time. Or should I create a batch to handle 10 tasks at once, and continue to next poll, like this:
const promises = [];
// 10 tasks only
for (let task of tasks) {
promises.push(handleTask(task));
}
// wait until finish 10 tasks
await Promise.all(promises);
// go go next poll
poller.poll();
But Promsie.all
will fail if one of handleTask
function fail.
And I think about another solution is using worker of Nodejs, and scale according to number of CPU cores available on my machine. Each handleTask
function will run on each worker:
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
}
And anonther thing I see on some websites is using child_proccess
, if I use child_process
, how many processes I can fork ?
For example:
var cluster = require('cluster');
if (cluster.isMaster) {
// fork child process for handleTask
var handleTask1 = require('child_process').fork('./handleTask');
var handleTask2 = require('child_process').fork('./handleTask');
}
in handleTask.js
file (listen on report
):
process.on('report', function(data) {
handleTask(data); //
});
What is the best way to handle parallel tasks in Nodejs ?
Node was designed to handle many concurrent IO bound (database query and HTTP calls) at the same time. This is accomplished in the node runtime through an event loop and asynchronous IO.
What this means is at the most basic level you don't have to do anything to handle hundreds or thousands of handleTasks
at a single time.
Each handleTask
invocation will enqueue io events internally in node. This allows node to start one handleTask
HTTP call, then switch to another, then switch to another, then start receiving the response of another call. It does this super quick and ideally without you having to worry about it.
Internally it handles these events in a queue, so that if you have tens of thousands of concurrent operations than there will be some latency penalty between the time a operation completes, and the time the node runtime is able to process that operation.
There are many common situations where you do have to manage concurrency:
handleTask
makes an HTTP call to a metered resource ie rate limited, you need to closely control and backoff on this resourceWhat is the best way to handle parallel tasks in Nodejs ?
The answer that you'll commonly see is to execute the tasks as they come in and let the node runtime handle scheduling them. Like i mentioned it is very important that you have latency metrics (or implement load shedding, or batching) in order to determine if the node internal event queues are overloaded.
Essential Reading: