Search code examples
javascriptnode.jsherokuworkerlong-running-processes

How to design a NodeJs worker to handle concurrent long running jobs


I'm working on a small side project and would like to grow it out, but I'm not too sure how. My question is, how should I design my NodeJs worker application to be able to execute multiple long running jobs at the same time? (i.e. should I be using multiprocessing libraries, a load-balancer, etc)

My current situation is that I have a NodeJs app running purely to serve web requests and put jobs on a queue, while another NodeJs app reading off that queue carries out those jobs (on a heroku worker dyno). Each job may take anywhere from 1 hour to 1 week of purely writing to a database. Due to the nature of the job, and it requiring an npm package specifically, I feel like I should be using Node, but at the same time I'm not sure it's the best option when considering I would like to scale it so that hundreds of jobs can be executed at the same time.

Any advice/suggestions as to how I should architect this design would be appreciated. Thank you.


Solution

  • First off, a single node.js app can handle lots of jobs that are just reading/writing from a database because those activities are mostly asynchronous which means node.js is spending most of its time doing nothing while waiting for the database to respond back from the last request. So, you could probably have a single node.js app handle literally at least hundreds of jobs, perhaps even thousands of jobs (depending upon exactly what the jobs are doing). In fact, I wouldn't be surprised if a single node.js app could throw more work at your database than the database could possibly keep up with.

    Then, if you want to scale how many worker node.js apps are running these jobs, you can simply fire up as many worker apps as you want (and as many as your hardware can handle) using the child_process module. You create one central work queue in your main node.js app. Then, create a bunch of child_processes whose job it is to grab N items from the work queue and process them. Note, I suggest you grab N items at once because a single node.js process can probably work on many separate jobs at once because of asynchronous I/O to your database.

    You may also want to explore the cluster module which doesn't even need a work queue. You can just fire up as many clustered instances of your main app as you want and they can all share the workload (both serving web pages and working on the long running jobs). The usual guideline is to set up a clustered instance for each CPU you have in the computer. So, if you have 4 cores, you would set up a cluster with a total of four servers in it.