Search code examples
node.jshttpasynchronousblocking

Is making sequential HTTP requests a blocking operation in node?


Note that irrelevant information to my question will be 'quoted'

like so (feel free to skip these).

Problem

I am using node to make in-order HTTP requests on behalf of multiple clients. This way, what originally took the client(s) several different page loads to get the desired result, now only takes a single request via my server. I am currently using the ‘async’ module for flow control and ‘request’ module for making the HTTP requests. There are approximately 5 callbacks which, using console.time, takes about ~2 seconds from start to finish (sketch code included below).

Now I am rather inexperienced with node, but I am aware of the single-threaded nature of node. While I have read many times that node isn’t built for CPU-bound tasks, I didn’t really understand what that meant until now. If I have a correct understanding of what’s going on, this means that what I currently have (in development) is in no way going to scale to even more than 10 clients.

Question

Since I am not an expert at node, I ask this question (in the title) to get a confirmation that making several sequential HTTP requests is indeed blocking.

Epilogue

If that is the case, I expect I will ask a different SO question (after doing the appropriate research) discussing various possible solutions, should I choose to continue approaching this problem in node (which itself may not be suitable for what I'm trying to do).

Other closing thoughts

I am truly sorry if this question was not detailed enough, too noobish, or had particularly flowery language (I try to be concise).

Thanks and all the upvotes to anyone who can help me with my problem!

The code I mentioned earlier:

var async = require('async');
var request = require('request');

...

async.waterfall([
    function(cb) {
        console.time('1');

        request(someUrl1, function(err, res, body) {
            // load and parse the given web page.

            // make a callback with data parsed from the web page
        });
    },
    function(someParameters, cb) {
        console.timeEnd('1');
        console.time('2');

        request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
            // more computation

            // make a callback with a session cookie given by the visited url
        });
    },
    function(jar, cb) {
        console.timeEnd('2');
        console.time('3');

        request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
            // do more parsing + computation

            // make another callback with the results
        });
    },
    function(moreParameters, cb) {
        console.timeEnd('3');
        console.time('4');

        request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
            // make final callback after some more computation.
            //This part takes about ~1s to complete
        });
    }
], function (err, result) {
    console.timeEnd('4'); //
    res.status(200).send();
});

Solution

  • Normally, I/O in node.js are non-blocking. You can test this out by making several requests simultaneously to your server. For example, if each request takes 1 second to process, a blocking server would take 2 seconds to process 2 simultaneous requests but a non-blocking server would take just a bit more than 1 second to process both requests.

    However, you can deliberately make requests blocking by using the sync-request module instead of request. Obviously, that's not recommended for servers.

    Here's a bit of code to demonstrate the difference between blocking and non-blocking I/O:

    var req = require('request');
    var sync = require('sync-request');
    
    // Load example.com N times (yes, it's a real website):
    var N = 10;
    
    console.log('BLOCKING test ==========');
    var start = new Date().valueOf();
    for (var i=0;i<N;i++) {
        var res = sync('GET','http://www.example.com')
        console.log('Downloaded ' + res.getBody().length + ' bytes');
    }
    var end = new Date().valueOf();
    console.log('Total time: ' + (end-start) + 'ms');
    
    console.log('NON-BLOCKING test ======');
    var loaded = 0;
    var start = new Date().valueOf();
    for (var i=0;i<N;i++) {
        req('http://www.example.com',function( err, response, body ) {
            loaded++;
            console.log('Downloaded ' + body.length + ' bytes');
            if (loaded == N) {
                var end = new Date().valueOf();
                console.log('Total time: ' + (end-start) + 'ms');
            }
        })
    }
    

    Running the code above you'll see the non-blocking test takes roughly the same amount of time to process all requests as it does for a single request (for example, if you set N = 10, the non-blocking code executes 10 times faster than the blocking code). This clearly illustrates that the requests are non-blocking.


    Additional answer:

    You also mentioned that you're worried about your process being CPU intensive. But in your code, you're not benchmarking CPU utility. You're mixing both network request time (I/O, which we know is non-blocking) and CPU process time. To measure how much time the request is in blocking mode, change your code to this:

    async.waterfall([
        function(cb) {
            request(someUrl1, function(err, res, body) {
                console.time('1');
                // load and parse the given web page.
                console.timeEnd('1');
                // make a callback with data parsed from the web page
            });
        },
        function(someParameters, cb) {
            request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
                console.time('2');
                // more computation
                console.timeEnd('2');
    
                // make a callback with a session cookie given by the visited url
            });
        },
        function(jar, cb) {
            request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
                console.time('3');
                // do more parsing + computation
                console.timeEnd('3');
                // make another callback with the results
            });
        },
        function(moreParameters, cb) {
            request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
                console.time('4');
                // some more computation.
                console.timeEnd('4');
    
                // make final callback
            });
        }
    ], function (err, result) {
        res.status(200).send();
    });
    

    Your code only blocks in the "more computation" parts. So you can completely ignore any time spent waiting for the other parts to execute. In fact, that's exactly how node can serve multiple requests concurrently. While waiting for the other parts to call the respective callbacks (you mention that it may take up to 1 second) node can execute other javascript code and handle other requests.