Search code examples
node.jsasynchronousconcurrencyfork

Should I use clustering instead of asynchronous to handle parallel tasks in Nodejs?


Sorry for my naive question, I’m very new with Nodejs.

I’m building a polling that will handle many tasks at the same time. And each task might take 10 -> 15 seconds to finish. This is my Poller class:

class Poller extends EventEmitter {
    constructor(timeout) {
        super();
        this.timeout = timeout;
    }

    poll() {
        setTimeout(() => this.emit("poll"), this.timeout);
    }

    onPoll(fn) {
        this.on("poll", fn); // listen action "poll", and run function "fn"
    }
}

And this is my current code inside each poll:

let poller = new Poller(3000); // 3 seconds
poller.onPoll(() => {
    // handle many tasks at the same time
    for (let task of tasks) {
        // handleTask function will take 15 seconds
        // query database, make http request...
        handleTask(task); 
    }
    poller.poll();

})

If the tasks increase, like 100 tasks, Should I handle 100 tasks at the same time. Or should I create a batch to handle 10 tasks at once, and continue to next poll, like this:

const promises = [];
// 10 tasks only
for (let task of tasks) {
    promises.push(handleTask(task));
}
// wait until finish 10 tasks
await Promise.all(promises);
// go go next poll
poller.poll();

But Promsie.all will fail if one of handleTask function fail.

And I think about another solution is using worker of Nodejs, and scale according to number of CPU cores available on my machine. Each handleTask function will run on each worker:

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('death', function(worker) {
    console.log('worker ' + worker.pid + ' died');
  });
}

And anonther thing I see on some websites is using child_proccess, if I use child_process, how many processes I can fork ? For example:

var cluster = require('cluster');

if (cluster.isMaster) {
  // fork child process for handleTask
  var handleTask1    = require('child_process').fork('./handleTask');
  var handleTask2    = require('child_process').fork('./handleTask');

}

in handleTask.js file (listen on report):

process.on('report', function(data) {
  handleTask(data); // 
});

What is the best way to handle parallel tasks in Nodejs ?


Solution

  • Node was designed to handle many concurrent IO bound (database query and HTTP calls) at the same time. This is accomplished in the node runtime through an event loop and asynchronous IO.

    What this means is at the most basic level you don't have to do anything to handle hundreds or thousands of handleTasks at a single time.

    Each handleTask invocation will enqueue io events internally in node. This allows node to start one handleTask HTTP call, then switch to another, then switch to another, then start receiving the response of another call. It does this super quick and ideally without you having to worry about it.

    Internally it handles these events in a queue, so that if you have tens of thousands of concurrent operations than there will be some latency penalty between the time a operation completes, and the time the node runtime is able to process that operation.

    There are many common situations where you do have to manage concurrency:

    • Suppose handleTask makes an HTTP call to a metered resource ie rate limited, you need to closely control and backoff on this resource
    • Providing an upper bound on the amount of work you allow into the system in order to keep latencies acceptable (load shedding, bulkheading)

    What is the best way to handle parallel tasks in Nodejs ?

    The answer that you'll commonly see is to execute the tasks as they come in and let the node runtime handle scheduling them. Like i mentioned it is very important that you have latency metrics (or implement load shedding, or batching) in order to determine if the node internal event queues are overloaded.

    Essential Reading: