Search code examples
node.jsgitlarge-datachild-process

maximum line count in node js : child-process.exec


question

Do Node's child-process have a limit ? And how can I do to fetch (chunk by chunk) a HUGE command output like git show with a huge file ?


general problem

I try to parse results of git show <commit sha> and there is a HUGE file (308 344 lines)

when running git show > showed_by_git.txt i have the right output, with all the files, and a result of 118 667 lines

when running node's child-process i only retrieve 32 602 lines ...


my test and its result

I simplified my code and used child-process to count the number of chars and the number of lines it resolves before stopping the flow.

the result shows that it stops at 32 000+ lines instead of expected 118 667

You can reproduce this at home if you have a repository with some HUUUGE file that has been commited recently

const childProcess = require('child_process')

function fetchCommand (command) {
  return new Promise((resolve, reject) => {
    const sub = childProcess.exec(command)

    let chars = 0
    let lines = 0

    sub.stdout.on('data', function (chunk) {
      chars += chunk.length
      lines += chunk.split('\n').length
      console.log('chars:' + chars + ' lines:' + lines)
      // logs the char and line count on each chunk of data, 
      // then 'forgets' the data : no memory overloading
    })
    sub.stdout.on('close', function () {
      console.log('CLOSED')
    })
    sub.stderr.on('error', function (err) {
      console.log('ERROR: ' + err.message)
    })
  })
}

fetchCommand('git show').catch(err => console.log(err))

output

Here is the output :

C:\Users\guill\.code\git2stats>node examples/fetchTest.js
chars:4096 lines:126
chars:73728 lines:2117
chars:131072 lines:3772
chars:176128 lines:5176
chars:229376 lines:6560
chars:262144 lines:7663
chars:323584 lines:9171
chars:393216 lines:11304
chars:462848 lines:13483
chars:475136 lines:13849
chars:507904 lines:14916
chars:536576 lines:15839
chars:573440 lines:17028
chars:618496 lines:18484
chars:688128 lines:20539
chars:737280 lines:22000
chars:765952 lines:22930
chars:794624 lines:23860
chars:823296 lines:24794
chars:892928 lines:26976
chars:962560 lines:29104
chars:991232 lines:30003
chars:1032192 lines:31292
chars:1073152 lines:32602
CLOSED

You can see that it stops at 32 602 lines, whereas this particular git show has 118 667 lines to show

the last chunk

I checked the last chunk of data to see if it did something special about the big file, but I can confirm it stops right in the middle of the file


context

I am writing a git statistics tools, this program is on a very good way since I could parse git log --stat then git show <commit sha> for each commit and return a satisfying json


Solution

  • This is some kind of Node foot-gun that I was previously unaware of.

    Yes, there is a limit. It is configured with the maxBuffer option (see docs), which can be set to Infinity if you like. The idea that this can be set to Infinity is not documented.

    const sub = childProcess.exec(command, { maxBuffer: Infinity });
    

    I am honestly shocked that this limit exists, and I will now be force to do a code review across a large body of code to find every place where the child_process module is used and see if I need to add a maxBuffer option.

    Consider it an example of how to design an interface poorly.

    Minor note

    You probably want to handle sub.on('exit', code => {}) or sub.on('close') which is the same thing, so you can check the exit status of Git and raise an error if the status is not 0. Something like this:

    sub.on('exit', code => {
        if (code == 0) {
            resolve(...);
        } else {
            reject(...);
        }
    });