Search code examples
node.jsasynchronouspromisenode-promisify

Why would promisify cause a loop to take massively longer?


I have an app which has to extract color info from a video and it does this by analyzing each frame. First I extract the frames and then load an array of their locations in memory. As you might imagine, for even a small video it can be in the thousands.

The function I use to extract each frames color info is a promise so I opted to batch an array of promises with Promise.all

With each files absolute path, I read the file with fs and then pass it along to be processed. I've done this with many stand alone images and know the process only takes about a second but suddenly it was taking almost 20min to process 1 image. I finally figured out that using promisify on fs.readFile was what caused the bottle neck. What I don't understand is why?

In the first one fs.readFile is transformed inside of the promise that's returned while in the second one fs.readFile is just used as it normally would be and I wait for resolve to be called. I don't mind using the non-promise one, I'm just curious why this would cause such a slow down?

The second I stopped using promisify the app sped back up to 1 frame / second

The slow code:

   async analyzeVideo(){
    await this._saveVideo();
    await this._extractFrames();
    await this._removeVideo();

    const colorPromises = this.frameExtractor.frames.map(file => {
      return new Promise(resolve => {
        //transform image into data
        const readFile = promisify(fs.readFile);
        readFile(file)
            .then(data => {
              const analyzer = new ColorAnalyzer(data);
              analyzer.init()
                  .then(colors => {
                    resolve(colors)
                  })
            })
            .catch((e)=> console.log(e));
      })
    });
    const colors = await runAllQueries(colorPromises);

    await this._removeFrames();

    this.colors = colors;

    async function runAllQueries(promises) {
      const batches = _.chunk(promises, 50);
      const results = [];
      while (batches.length) {
        const batch = batches.shift();
        const result = await Promise.all(batch)
            .catch(e=>console.log(e));
        results.push(result)
      }
      return _.flatten(results);
    }
  }

The fast code:

async analyzeVideo(){
    await this._saveVideo();
    await this._extractFrames();
    await this._removeVideo();
    const colorPromises = this.frameExtractor.frames.map(file => {
      return new Promise(resolve => {
        //transform image into data
        fs.readFile(file, (err, data) => {
          const analyzer = new ColorAnalyzer(data);
          analyzer.init()
              .then(colors => {
                resolve(colors)
              })
        });
      })
    });
    const colors = await runAllQueries(colorPromises);

    await this._removeFrames();

    this.colors = colors;

    async function runAllQueries(promises) {
      const batches = _.chunk(promises, 50);
      const results = [];
      while (batches.length) {
        const batch = batches.shift();
        const result = await Promise.all(batch)
            .catch(e=>console.log(e));
        results.push(result)
      }
      return _.flatten(results);
    }
  }

Solution

  • You don't need to promisify in each loop iteration, just do it once at the top of the module.

    Most likely the issue is caused by Promises that are never settled. You are not handling the error correctly, so Promise.all may never finish if an error is thrown.

    Instead of logging the error in .catch, you'll have to reject too, or resolve at least if you don't care about the errors. Also analyzer.init() errors are not being catched (if that function can reject)

    const readFile = promisify(fs.readFile);
    // ...
    
    const colorPromises = this.frameExtractor.frames.map(file => {
       return new Promise((resolve, reject) => {
            //transform image into data
            // const readFile = promisify(fs.readFile);
            readFile(file)
                .then(data => {
                  const analyzer = new ColorAnalyzer(data);
                  return analyzer.init()   
                })
                .then(resolve) // colors
                .catch((e)=> { 
                  reject(e);
                  console.log(e)
                });
          })
    })
    

    Aside from that runAllQueries is not doing what you think it's doing. You already executed all the promises.

    I recommend you use p-limit instead

    const pLimit = require('p-limit');
    
    const limit = pLimit(50);
    
    /* ... */
    
    const colorPromises = this.frameExtractor.frames.map(file => {
       return limit(() => {
            return readFile(file)
                .then(data => {
                  const analyzer = new ColorAnalyzer(data);
                  return analyzer.init()   
                })
                .then(resolve) // colors
        })
    })
    
    const colors = await Promise.all(colorPromises);
    

    Furthermore, if you're executing 50 reads at a time, you should increase the value of UV_THREADPOOL_SIZE which defaults to 4.

    At your entry point, before any require:

    process.env.UV_THREADPOOL_SIZE = 64 // up to 128
    

    Or call the script as: UV_THREADPOOL_SIZE=64 node index.js