Search code examples
javascriptnode.jsjsonstream

How to read stream of JSON objects per object


I have a binary application which generates a continuous stream of json objects (not an array of json objects). Json object can sometimes span multiple lines (still being a valid json object but prettified).

I can connect to this stream and read it without problems like:

var child = require('child_process').spawn('binary', ['arg','arg']);

child.stdout.on('data', data => {
  console.log(data);
});

Streams are buffers and emit data events whenever they please, therefore I played with readline module in order to parse the buffers into lines and it works (I'm able to JSON.parse() the line) for Json objects which don't span on multiple lines.

Optimal solution would be to listen on events which return single json object, something like:

child.on('json', object => {
   
});

I have noticed objectMode option in streams node documentation however I' getting a stream in Buffer format so I believe I'm unable to use it.

Had a look at npm at pixl-json-stream, json-stream but in my opinion none of these fit the purpose. There is clarinet-object-stream but it would require to build the json object from ground up based on the events.

I'm not in control of the json object stream, most of the time one object is on one line, however 10-20% of the time json object is on multiple lines (\n as EOL) without separator between objects. Each new object always starts on a new line.

Sample stream:

{ "a": "a", "b":"b" }
{ "a": "x",
  "b": "y", "c": "z"
}
{ "a": "a", "b":"b" }

There must be a solution already I'm just missing something obvious. Would rather find appropriate module then to hack with regexp the stream parser to handle this scenario.


Solution

  • I'd recommend to try parsing every line:

    const readline = require('readline');
    
    const rl = readline.createInterface({
     input: child.stdout
    });
    
    var tmp = ''
    rl.on('line', function(line) {
      tmp += line
      try {
        var obj = JSON.parse(tmp)
        child.emit('json', obj)
        tmp = ''
      } catch(_) {
        // JSON.parse may fail if JSON is not complete yet
      }
    })
    
    child.on('json', function(obj) {
      console.log(obj)
    })
    

    As the child is an EventEmitter, one can just call child.emit('json', obj).