node.js asynchronous blocking nonblocking synchronous

Node.js Synchronous Library Code Blocking Async Execution

Suppose you've got a 3rd-party library that's got a synchronous API. Naturally, attempting to use it in an async fashion yields undesirable results in the sense that you get blocked when trying to do multiple things in "parallel".

Are there any common patterns that allow us to use such libraries in an async fashion?

Consider the following example (using the async library from NPM for brevity):

var async = require('async');

function ts() {
  return new Date().getTime();
}

var startTs = ts();

process.on('exit', function() {
  console.log('Total Time: ~' + (ts() - startTs) + ' ms');
});

// This is a dummy function that simulates some 3rd-party synchronous code.
function vendorSyncCode() {
  var future = ts() + 50;  // ~50 ms in the future.

  while(ts() <= future) {} // Spin to simulate blocking work.
}

// My code that handles the workload and uses `vendorSyncCode`.
function myTaskRunner(task, callback) {
  // Do async stuff with `task`...

  vendorSyncCode(task);

  // Do more async stuff...

  callback();
}

// Dummy workload.
var work = (function() {
  var result = [];

  for(var i = 0; i < 100; ++i) result.push(i);

  return result;
})();

// Problem:
// -------
// The following two calls will take roughly the same amount of time to complete.
// In this case, ~6 seconds each.

async.each(work, myTaskRunner, function(err) {});

async.eachLimit(work, 10, myTaskRunner, function(err) {});

// Desired:
// --------
// The latter call with 10 "workers" should complete roughly an order of magnitude 
// faster than the former.

Are fork/join or spawning worker processes manually my only options?

Solution

Yes, it is your only option.

If you need to use 50ms of cpu time to do something, and need to do it 10 times, then you'll need 500ms of cpu time to do it. If you want it to be done in less than 500ms of wall clock time, you need to use more cpus. That means multiple node instances (or a C++ addon that pushes the work out onto the thread pool). How to get multiple instances depends on your app strucuture, a child that you feed the work to using child_process.send() is one way, running multiple servers with cluster is another. Breaking up your server is another way. Say its an image store application, and mostly is fast to process requests, unless someone asks to convert an image into another format and that's cpu intensive. You could push the image processing portion into a different app, and access it through a REST API, leaving the main app server responsive.

If you aren't concerned that it takes 50ms of cpu to do the request, but instead you are concerned that you can't interleave handling of other requests with the processing of the cpu intensive request, then you could break the work up into small chunks, and schedule the next chunk with setInterval(). That's usually a horrid hack, though. Better to restructure the app.