Child process exiting with SIGTERM (possibly due to exceeding maxBuffer); how can I confirm that it's a buffer problem and fix it?

Issue

My child process is exiting after about 30 minutes with SIGTERM and no other debug output. Given the information on Node.js child process exits with SIGTERM, I think it's reasonably likely that the process is exiting due to exceeding its maxBuffer, as the uptime is non-deterministic and indeed is improved by increasing the maxBuffer. With the default 205 KB of maxBuffer, it consistently runs for 1-3 minutes; and with 10 MB, it consistently runs for 30-60 minutes.

Aim

The child process is producing a stream of text at an average rate of about 1 MB every 10 minutes (1.66 KB per second).

The log entries in the text stream are multi-line (see sample below of the lines that make up one log entry), so I am using Node to parse them line-by-line to extract the information of interest (from * << Request >> to - End ):

*   << Request  >> 113214123 
-   Begin          req 113214077 rxreq
-   ReqMethod      GET
-   ReqURL         /ping
-   RespStatus     200
-   End

Code

const { exec } = require('child_process');
const { createInterface } = require('readline');

const cp = exec("tail -F 2021-02-25.log", { maxBuffer: 10000000 });

createInterface(cp.stdout, cp.stdin)
.on('line', line => {
    // ...
    // (Implementation not shown, as it's hundreds of lines long):
    // Add the line to our line-buffer, and if we've reached "-   End   " yet, parse
    // those lines into a corresponding JS object and clear the line-buffer, ready
    // to receive another line.
    // ...
});

cp.on('close', (code, signal) => {
    console.error(`Child process exiting unexpectedly. Code: ${code}; signal: ${signal}.`);
    process.exit(1);
});

Question

Essentially, "how can I avoid getting the SIGTERM" – but more specifically:

How can I confirm that the SIGTERM is really received due to the child process exceeding the buffer? Is there a way, for example, to inspect the buffer usage of a child process while it's running?
Is the buffer perhaps getting overloaded due to Node spending too long executing the line-parsing function? Is there a way to monitor this?
Am I missing an extra aspect that I need to do, such as manually flushing some buffer?

I think throwing extra buffer at the problem is the wrong way to solve it; 10 MB already seems pretty excessive, and I need to be able to guarantee indefinite uptime (rather than increasing buffer a little more each time it fails).

Solution

How to diagnose that the child process exited due to exceeding its buffer

I searched the Node.js codebase's tests for mentions of maxBuffer and found one that shows how to diagnose that a child process exited due to exceeding its allotted maxBuffer, which I'll copy out here:

// One of the tests from the Node.js codebase:
{
  const cmd =
    `"${process.execPath}" -e "console.log('a'.repeat(1024 * 1024))"`;

  cp.exec(cmd, common.mustCall((err) => {
    assert(err instanceof RangeError);
    assert.strictEqual(err.message, 'stdout maxBuffer length exceeded');
    assert.strictEqual(err.code, 'ERR_CHILD_PROCESS_STDIO_MAXBUFFER');
  }));
}

So I incorporated an equivalent diagnostic function into my app:

const { exec } = require('child_process');
const { createInterface } = require('readline');

/**
 * This termination callback is distinct to listening for the "error" event
 * (which does not fire at all, in the case of buffer overflow).
 * @see https://nodejs.org/api/child_process.html#child_process_event_error
 * @see https://nodejs.org/api/child_process.html#child_process_child_process_exec_command_options_callback
 * @param {import("child_process").ExecException | null} error 
 * @param {string} stdout
 * @param {string} stderr 
 * @type {import("child_process").SpawnOptions}
 */
function terminationCallback(error, stdout, stderr){
    if(error === null){
        // Healthy termination. We'll get an exit code and signal from
        // the "close" event handler instead, so will just defer to those
        // logs for debug.
        return;
    }
    console.log(
        `[error] Child process got error with code ${error.code}.` + 
        ` instanceof RangeError: ${error instanceof RangeError}.` +
        ` Error message was: ${error.message}`
    );
    console.log(`stderr (length ${stderr.length}):\n${stderr}`);
    console.log(`stdout (length ${stdout.length}):\n${stdout}`);
}

const cp = exec(
    "tail -F 2021-02-25.log",
    { maxBuffer: 10000000 },
    terminationCallback
);

createInterface(cp.stdout, cp.stdin)
.on('line', line => {
    // ...
    // Implementation not shown
    // ...
});

cp.on('close', (code, signal) => {
    console.error(
        `Child process exiting unexpectedly. ` + 
        `Code: ${code}; signal: ${signal}.`
    );
    process.exit(1);
});

When I ran my app for a few minutes, indeed, I found that this termination callback was called, and that it satisfied all the assertions expected in the Node.js test for a child process that had exited due to exceeding its buffer.

I also noticed that the stdout returned in that termination callback was exactly 1000000 characters long – this matched exactly with the number of bytes that I'd set as the maxBuffer. It was at this point that I came to understand the difference between require("child_process").exec() and require("child_process").spawn().

How to make a child process that can safely stream any amount of data from `stdout`

exec() and spawn() have overlapping functionality, but ultimately are suited for different purposes, which is not really spelled out in the Child Process documentation. The clue is in the construction arguments that they accept.

exec() accepts a termination callback, and its options support maxBuffer (but not stdio).
spawn() doesn't accept a termination callback, and its options support stdio (but not maxBuffer).

The headline here is that:

exec() is suited for tasks that have a definite end (upon which you'd harvest the stdout/stderr that the child process has been accumulating into its buffer the whole time it has been working).
spawn() is suited for tasks that may run indefinitely, as you can configure where the stdout/stderr/stdin streams get piped off to. The default configuration for options.stdio, "pipe", which pipes them to the parent process (your Node.js app), is appropriate in our example where we need to establish a readline interface and consume the stdout line-by-line. There is no explicit buffer limit other than that imposed by the operating system itself (which should be quite generous!).

So, if you're ever writing a Node.js app that manages a child process tasked with anything that runs indefinitely:

watching logs (e.g. tail -F 2021-02-25.log) non-stop and parsing them
running an always-on live streaming service (e.g. ffmpeg <some complex args here>)

... you should be using spawn()!

Conversely, for tasks with a definite end and a predictable, reasonable buffer size (e.g. mkdir -vp some/dir/path or rsync --verbose <src> <dest>), then you can go ahead and use exec()!

There may of course be other distinctions between the two, but this aspect of stream handling is really impactful.

How to rewrite using `spawn()`

Just two lines needs changing (and one of them is merely the import statement)! Note that the default options.stdio value of "pipe" is appropriate here, so we don't even need to pass in an options object.

const { spawn } = require('child_process');
const { createInterface } = require('readline');

const cp = spawn("tail", ["-F", "2021-02-25.log"]);

createInterface(cp.stdout, cp.stdin)
.on('line', line => {
    // ...
    // (Implementation not shown, as it's hundreds of lines long):
    // Add the line to our line-buffer, and if we've reached "-   End   " yet, parse
    // those lines into a corresponding JS object and clear the line-buffer, ready
    // to receive another line.
    // ...
});

cp.on('close', (code, signal) => {
    console.error(`Child process exiting unexpectedly. Code: ${code}; signal: ${signal}.`);
    process.exit(1);
});