Search code examples
javascriptnode.jsgulparchiverjs

How to create a ZIP file with Gulp that contains a lot of files?


I have a Gulp task where I add lots of files (more than 2700 in one case, but it can be several thousands in some others cases) in a ZIP file. The code is as follow:

const fs = require('fs');
const archiver = require('archiver')('zip');

let zip = fs.createWriteStream('my-archive.zip');
return gulp.src('app/**/*')
  .pipe(through.obj((file, encoding, cb) => {
    let pathInZip = '...';
    if (!isADirectory(file.path)) { // Do not zip the directory itself
      archiver.append(fs.createReadStream(file.path), {
        name: pathInZip,
        mode: fs.statSync(file.path)
      });
    }
    cb(null, file);
  }, cb => {
    // Now create the ZIP file!
    archiver.pipe(zip);
    archiver.finalize();
    cb();
  }));

This code works on small projects, but when it deals with more than 2000 files, I get the following error:

events.js:154
throw er; // Unhandled 'error' event
^

Error: EMFILE: too many open files, open 'd:\dev\app\some\file'
at Error (native)

So I understand that having 2000+ files opened at the same time before writing them in the ZIP is not a good idea.

How can I ask the ZIP file to be written without having the need to open all the files?

Thanks.

For information: node 5.5.0 / npm 3.8.5 / archiver 1.0.0 / windows


Solution

  • Gulp already takes care of a lot of the things you're trying to do:

    • gulp.src() reads file contents and makes an fs.stat() call for each file. It then stores both as file.contents and file.stat on the vinyl-file objects it emits.
    • It does so by using the graceful-fs package, which automatically backs off in case of an EMFILE error and retries when another file has closed. That prevents the "too many open files" problem you're experiencing.

    Unfortunately you're not taking advantage of any of those because:

    • You're making explicit calls to fs.statSync() and fs.createReadStream(). There's really no need for that since gulp has already done that for you. You're effectively reading each file twice (and creating twice the number of file descriptors in the process).
    • You're circumventing gulp's built-in protection against EMFILE by making direct use of the fs module which does not have any guards against the "too many open files" problem.

    I've rewritten your code to take advantage of gulp's features. I've also tried to make it a little more gulp-idiomatic, e.g. by using gulp-filter to get rid of the directories:

    const gulp = require('gulp');
    const fs = require('graceful-fs');
    const archiver = require('archiver')('zip');
    const through = require('through2');
    const filter = require('gulp-filter');
    
    gulp.task('default', () => {
      var zip = fs.createWriteStream('my-archive.zip');
      archiver.pipe(zip);
      return gulp.src('app/**/*')
        .pipe(filter((file) => !file.stat.isDirectory()))
        .pipe(through.obj((file, encoding, cb) => {
          var pathInZip = '...';
          archiver.append(file.contents, {
            name: pathInZip,
            mode: file.stat
          });
          cb(null, file);
        }, cb => {
          zip.on('finish', cb);
          archiver.finalize();
        }));
    });