node.js express rabbitmq long-running-processes

Offloading strategy for longer-running tasks in Express Routes

I have an application which uses Express as user-facing framework for my REST API, together with RabbitMQ for RPC-like function calls to a clustered backend. Also, I use Q to promisify all my workload in the Routes.

In one of the Routes I'm using, I trigger some functionality which crawls a URL specified in the Route's parameters, does GeoIP lookups, normalizes result formats etc. This can take several seconds, depending on the response times of the crawled URL's servers.

What I would like to achieve is that the user that POSTs a new URL to crawl gets an immediate feedback to his request (status 200 = "Crawling request acknowledged"), and not having the request waiting for the crawling to finish.

My ideas are either

Sending the URL to a specific queue in RabbitMQ, and have another process to listen to the queue's jobs
Using something like child processes inside the Express Routes

What would be to best solution to solve this? Thanks for your valuable input.

Solution

A very loaded question with has a lot of options, each having their own impacts on the overall system. Not sure if there is a right answer. It's really a matter of preference and what you feel comfortable with. IMO, I'd try to keep things simple. Adding another process (RabbitMQ) means another software package (or even whole server) to manage, configure, permission and secure.

A few things to consider. Is the bulk of your processing I/O bound or CPU bound? If you're using a remote service to do the GeoIP lookups, it may be more I/O bound which is perfect for node. Why not have node just handle everything via:

process.nextTick(function() {
  // Do your lookup here
}

res.status(201).end();

Then use something like socket.io to send the results to the client asynchronously?

Either way, I'd recommend returning a 202 - Accepted, not a 200.