Search code examples
node.jsasynchronouslarge-files

read huge json file and how to know when data is all been received?


I am having problem with asynchronous nature of NodeJs. For example, I have the following code, which reads a huge json file

var json_spot_parser = function(path){

  this.count = 0;
  var self = this;
  let jsonStream = JSONStream.parse('*');
  let fileStream = fs.createReadStream(path);

   jsonStream.on('data', (item) => {
    // console.log(item) // which correctlt logged each json in the file
    self.count++;  //134,000
   });

   jsonStream.on('end', function () {
     //I know it ends here, 
   });

   fileStream.pipe(jsonStream);

};

json_spot_parser.prototype.print_count=function(){
  console.log(this.count);
}


module.export= json_spot_parser;

In another module i use it as

   var m_path =   path.join(__dirname, '../..', this.pathes.spots);  
   this.spot_parser = new json_spot_parser(m_path); 
   this.spot_parser.print_count();

I want to read all json objects and process them. but the asynchronous is my problem. I am not familiar with that kind of programming. I used to program in sequence such as c, c++ so on.

Since I don't know when these program finish reading json objects, I don't know when/where to process them.

after this.spot_parser = new json_spot_parser(m_path);

I expect to deal with json objects, but as I said i can't do it.

I want someone explain me how to write nodejs program in such case, I want to know the standard practice. So far I read some posts, but I believe most of them are short-term fixes.

So, my question is :

How a NodeJs programmer handles problems?

Please tell me standard way, I want to be good at this NodeJs. Thx!


Solution

  • You can use callbacks as @paqash suggested but returning a promise would be a better solution.

    At first, return a new Promise in the json_spot_parser

    var json_spot_parser = function(path){
      return new Promise(function(resolve, reject) {
        this.count = 0;
          var self = this;
          let jsonStream = JSONStream.parse('*');
          let fileStream = fs.createReadStream(path);
        
           jsonStream.on('data', (item) => {
            // console.log(item) // which correctlt logged each json in the file
            self.count++;  //134,000
           });
        
           jsonStream.on('end', function () {
             resolve(self.count);
           });
        
           fileStream.pipe(jsonStream);
        
        };
        
        json_spot_parser.prototype.print_count=function(){
          console.log(this.count);
        }
      }); 
    
    module.export= json_spot_parser;
    

    In another module

    var m_path = path.join(__dirname, '../..', this.pathes.spots);  
    this.spot_parser = new json_spot_parser(m_path); 
    this.spot_parser.then(function(count) {console.log(count)});
    

    As you mentioned, Node.js has an async mechanize and you should learn how to think in that way. It's required if you would like to be good at Node.js. If I can suggest, you should start with this article: Understanding Async Programming in Node.js

    Ps: Try to use camel case variables and follow Airbnb JS style guide.