Search code examples
javascriptnode.jsmultithreadingmultiprocessinglibuv

Why do we need node clusters if we can change the number of threads utilized by the threadpool?


The title basically says it all, why should we create node workers when we can just change the number of threads the libuv threadpool uses?


Solution

  • The libuv threadpool isn't used for your JavaScript code, and is only used for a subset of Node.js's APIs (though it's used by one of the most major ones, fs). From the documentation:

    Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, libuv's threadpool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the threadpool are:

    • all fs APIs, other than the file watcher APIs and those that are explicitly synchronous
    • asynchronous crypto APIs such as crypto.pbkdf2(), crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()
    • dns.lookup()
    • all zlib APIs, other than those that are explicitly synchronous

    So the size of the libuv threadpool helps with lots of overlapping fs and similar calls, but it's not the whole story.

    The libuv pool doesn't help you if you have JavaScript code that needs to do substantial work synchronously; that code runs on a single thread (unless you spin up workers). Moreover, Node.js uses that same thread to check for completions of asynchronous work (including libuv completions). From the event loop page:

    The following diagram shows a simplified overview of the event loop's order of operations.

       ┌───────────────────────────┐
    ┌─>│           timers          │
    │  └─────────────┬─────────────┘
    │  ┌─────────────┴─────────────┐
    │  │     pending callbacks     │
    │  └─────────────┬─────────────┘
    │  ┌─────────────┴─────────────┐
    │  │       idle, prepare       │
    │  └─────────────┬─────────────┘      ┌───────────────┐
    │  ┌─────────────┴─────────────┐      │   incoming:   │
    │  │           poll            │<─────┤  connections, │
    │  └─────────────┬─────────────┘      │   data, etc.  │
    │  ┌─────────────┴─────────────┐      └───────────────┘
    │  │           check           │
    │  └─────────────┬─────────────┘
    │  ┌─────────────┴─────────────┐
    └──┤      close callbacks      │
       └───────────────────────────┘
    

    So that one thread is doing a lot of work even when not running your JavaScript code.

    If you spin up worker threads, they each have their own event loop and can process completions related to their work even if another thread is busy doing something CPU-heavy.

    So whether worker threads are useful in any given situation is highly dependent on what your code is doing.