Search code examples
node.jsworker-thread

nodejs | worker_thread | keep alive tcp connection within workers?


Using worker_threads from node 12, is it suitable to establish remote connection within the workers and keep those connection alive ?

I don't mean sharing the socket between the master and the workers like we could do with node cluster and fork.

The idea would be to have pools of secure connections already established within the workers to use if needed.

Let say I have a pool of 10 workers. When a worker is created, some pre-established "TLS" connection are created (streams) to server X,Y amd Z, and the worker is marked as "ready"

Each time that I use a worker to process "heavy" tasks (mapReduce, etc, ) and if I need to post data or get data to/from server X,Y or Z during the process, I use the appropriate "TLS" connection already established from the pool.

Once the task completed, the result is return to the master and the worker just execute a new/next tasks.

1 ) Do you see any side effect / impact of doing so ?

2 ) would it be better to have the pool of "TLS" connection on the "main thread" (master) . If "remote" data are needed within the workers during the tasks, use the "postMessage" method to communicate with the "master" ( and vice/versa ).

Thanks


Solution

  • Worker Threads do not work for remote connections. However, you can build your own system that would work similar using TLS sockets. In a case of such a system I would definitely recommend keeping these types of connections alive. There is a significant latency in setting up these connections, and having these connections active in memory, will use a minimum amount of resources.

    Keep in mind that a system like this has some drawbacks:

    1. You are working with different machines, and each of these machines can have its own set of failure conditions.
    2. You are communicating over a network, connections with remote servers might suddenly drop, for any reason imaginable.
    3. You are increasing the physical distance, this will cause latency.

    So keep this in the back of your mind.

    Would I recommend building a system like this. It is really hard to determine and it relies on your use case, time and money. You mentioned the cluster nodes are processing 'heavy tasks', and with that I reckon CPU / GPU intensive tasks. So a system like this might be a good solution, however, a simple rest API in front of your processing servers might be good enough. Or maybe even database synchronized servers, that just check the database for tasks to execute.

    There are many solutions for the same problem, just have to consider what works best for your project(s).