In the documentation for Node's Child Process spawn()
function, and in examples I've seen elsewhere, the pattern is to call the spawn()
function, and then to set up a bunch of handlers on the returned ChildProcess
object. For instance, here is the first example of spawn()
given on that documentation page:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
The spawn()
function itself is called on the second line. My understanding is that spawn()
starts a child process asynchronously. From the documentation:
The child_process.spawn() method spawns a new process using the given command, with command line arguments in args.
However, the following lines of the script above go on to set up various handlers for the process, so it's assuming that the process hasn't actually started (and potentially finished) between the time spawn()
is called on line 2 and the other stuff happens on the subsequent lines. I know JavaScript/Node is single threaded. However, the operating system is not single threaded, and naively one would read that spawn()
call to be telling the operating system to spawn the process right now (at which point, with unfortunate timing, the OS could suspend the parent Node process and run/complete the child process before the next line of the Node code is executed).
But it must be that the process doesn't actually get spawned until the current JavaScript function completes (or more generally the current JavaScript event handler that called the current function completes), right?
That seems like a pretty important thing to say. Why doesn't it say that in the Child Process documentation page? Is there some overriding Node principle that makes it unnecessary to say that explicitly?
The spawning of the new process starts immediately (it's handed over to the OS to actually fire up the process and get it going). Starting the new process with .spawn()
is asynchronous and non-blocking. So, it will initiate the operation with the OS and immediately return. You might think that that's why it's OK to set up event handlers after it returns (because the process hasn't yet finished starting). Well, yes and no. It likely hasn't yet finished starting the new process, but that isn't the main reason why it's OK.
It's OK, because node.js runs all its events through a single threaded event queue. Thus no events from the newly spawned process can be processed until after your code finishes executing and returns control back to the system. Only then can it process the next event in the event queue and trigger one of the events you are registering handlers for.
Or, said another way, none of the events from the other process are pre-emptive. They won't/can't interrupt your existing Javascript code. So, since you're still running your Javascript code, those events can't get run yet. Instead, they sit in the event queue until your Javascript code finishes and then the interpreter can go get the next event from the event queue and run the callback associated with it. Likewise, that callback runs until it returns back to the interpreter and then the interpreter can get the next event and run its callback and so on...
That's why node.js is called an event-driven system.
As such, it's perfectly fine to do this type of structure:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
None of those data
or close
events can execute their callbacks until after your code is done and returns control back to the system. So, it's perfectly safe to set up those event handlers like you are. Even if the newly spawned process was running and generating events right away, those events will just sit in the event queue until your Javascript finishes what it is doing (which includes setting up your event handlers).
Now, if you delayed setting up the event handlers until some future tick of the event loop (as shown below) with something like a setTimeout()
, then you could miss some events:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
setTimeout(() => {
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
}, 10);
Here you are not setting up the event handlers immediately as part of the same tick of the event loop, but after a short delay. Therefore some events could get processed from the event loop before you install your event handlers and you could miss some of these events. Obviously, you would never do it this way (on purpose), but I just wanted to show that code running on the same tick of the event loop does not have a problem, but code running on some future tick of the event loop could have a problem missing events.