Search code examples
node.jsnonblockingpool

Non blocking Loop in Node.js and pooling?


I'm starting to play around with node.js and I have an application which basically iterating over dozens thousands of object and performing some various asynchronous http requests for all of them and populate the object with various data returned from the http requests.. This question is more concerning best practices with Node.js, non blocking operations and probably related to pooling. Forgive me If I'm using the wrong term, as I'm new to this and please don't hesitate to correct me.

So below is a brief summary of the code I have got a loop which kind of doing iterating over thousands

//Loop briefly summarized
for (var i = 0; i < arrayOfObjects.length; i++) {
    do_something(arrayOfObjects[i], function (error, result){
        if(err){
            //various log       
        }else{
            console.log(result);        
        }
    });
}

//dosomething briefly summarized
function do_something (Object, callback){
    http.request(url1, function(err, result){
        if(!err){
            insert_in_db(result.value1, function (error,result){
                //Another http request with asynchronous
            });
        }else{ 
            //various logging error
        }  
    });
    http.request(url2, function(err, result){
        //some various logic including db call
    });  
}

In reality in do_something there is a complex logic but it's not really the matter right now So my problem are the following

I think the main issue is in my loop is not really optimized because it's kind of a blocking event. So the first http request results within dosomething are avaialble are after the loops is finished processing and then it's cascading. If there a way somehow to make kind of pool of 10 or 20 max simualtenous execution of do_something and the rest are kind of queued when a pool ressource is available?

I hope I clearly explained myself , don't hesitate to ask me if I need to details.

Thanks in advance for your feedbacks,

Anselme


Solution

  • Your loop isn't blocking, per se, but it's not optimal. One of the things it does is schedules arrayOfObjects.length http requests. Those requests will all be scheduled right away, as your loop progresses. In older versions of node.js, you would have had the benefit of default of 5 concurrent requests per host, but that default is later changed.

    But then the actual opening of sockets, sending requests, waiting for responses, this will be individual for each loop. And each entry will finish in it's own time (depending, in this case, on the remote host, or e.g. database response times etc).

    Take a look at async, vasync, or some of it's many alternatives, as suggested in comments, for pooling.

    You can take it even a step further and use something like Bluebird Promise.map, with concurrency option set, depending on your use case.