Search code examples
node.jsphantomjscluster-computingchild-processnode-cluster

Node cluster not spawning phantom instances in the proper worker


I'm using NodeJS with PhantomJS. My goal is to create 4x node instances with node cluster, each with 2 phantom children. And my code looks like this:

cluster.js:

var numCPUs = 4;

if (cluster.isMaster) {

    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }

    cluster.on('exit', function(worker, code, signal) {
        console.log('worker ' + worker.process.pid + ' died');
        cluster.fork();
    });

} else {
    require("./app");
}

App.js looks like this:

var instances = [];
var phantom = require('phantom');

function InstanceManager(instCount) {
    for (var i = 0; i < instCount; i++) {
           phantom.create(function(phantomInstance) {
            instances.push({
                cycle: 0,
                locked: false,
                instance: phantomInstance
            });
        });
    }
}

InstanceManager(2);

setInterval(function() {
    var i = 0;
    console.log('--' + instances.length);
}, 5000);

So after running cluster.js the expected output in node console each 5 seconds should be:

--2
--2
--2
--2

but instead looks like this:

--0
--0
--0
--8

Why the phantom instances aren't attached to the proper worker?


Solution

  • The problem seems to be with the phantom module, not working properly with the cluster. If you replace it with a sort of test double, like

    var phantom = {
        create: function (callback) {
            setImmediate(callback);
        }
    };
    

    you get the expected all-2 output. To continue my investigation, I modified node_modules/phantom/phantom.js to get a minimal set up where your problem occurs. This is it:

    var http = require('http'), shoe = require('shoe'), spawn = require('win-spawn');
    
    exports.create = function(cb) {
        var httpServer, sock;
        httpServer = http.createServer();
        httpServer.listen(0);
        httpServer.on('listening', function() {
            var listeningPort = httpServer.address().port;
            spawn('phantomjs', [].concat([__dirname + '/shim.js', listeningPort]));
        });
        sock = shoe(cb);
        return sock.install(httpServer, '/dnode');
    };
    

    What happens here is that a listening server is started, then a phantomjs process is started, which connects to the listening server via a WebSocket, and writes to it, after which the callback cb() is called. You can get at this understanding by looking at shim.js and experimenting a bit.

    What's the problem then!? Well, if you console.log() the listeningPort you will see that you get the same port 8 times. So it appears that every time you call phantom.create(), you somehow reuse the same listening server, hence your callback is called only in one process.

    This seems to be a peculiar behaviour of the Node version you're using, when trying to listen on port 0. Which would also explain why with another version of Node, the problem didn't occur (according to a comment above). Here is a gist of mine which isolates this counter-intuitive behaviour.

    The solution is to specify a port when calling phantom.create(), and use 8 different ports, e.g., phantom.create(fn, { port: YOUR_PORT }) in your app.js.