Search code examples
jsonnode.jsjsonlines

how to parse a large, Newline-delimited JSON file by JSONStream module in node.js?


I have a large json file, its is Newline-delimited JSON, where multiple standard JSON objects are delimited by extra newlines, e.g.

{'name':'1','age':5}
{'name':'2','age':3}
{'name':'3','age':6}

I am now using JSONStream in node.js to parse a large json file, the reason I use JSONStream is because it is based on stream.

However,both parse syntax in the example can't help me to parse this json file with separated JSON in each line

var parser = JSONStream.parse(**['rows', true]**);
var parser = JSONStream.parse([**/./**]);

Can someone help me with that


Solution

  • Warning: Since this answer was written, the author of the JSONStream library removed the emit root event functionality, apparently to fix a memory leak. Future users of this library, you can use the 0.x.x versions if you need the emit root functionality.

    Below is the unmodified original answer:

    From the readme:

    JSONStream.parse(path)

    path should be an array of property names, RegExps, booleans, and/or functions. Any object that matches the path will be emitted as 'data'.

    A 'root' event is emitted when all data has been received. The 'root' event passes the root object & the count of matched objects.

    In your case, since you want to get back the JSON objects as opposed to specific properties, you will be using the 'root' event and you don't need to specify a path.

    Your code might look something like this:

    var fs = require('fs'),
        JSONStream = require('JSONStream');
    
    var stream = fs.createReadStream('data.json', {encoding: 'utf8'}),
        parser = JSONStream.parse();
    
    stream.pipe(parser);
    
    parser.on('root', function (obj) {
      console.log(obj); // whatever you will do with each JSON object
    });