Search code examples
javascriptsqlnode.jspostgresqlnode-postgres

Import sql file in node.js and execute against PostgreSQL


I'm looking for an efficient way to take a raw sql file and have it executed synchronously against a postgres database, akin to if you ran it through psql.

I have an sql file which creates all databases, imports data, etc. I need to execute this using node.js but cannot find any module which does this automatically. For the node.js application itself, we use node-postgres ('pg'), knex.js and bookshelf.js. I assume though that pg is best for this.

One alternative I can think of is to read the full file, split it by semicolons, replace newlines with spaces, trim any duplicate space, then feed it into pg one by one in a manner that they're executed sequentially, not asynchronously. I'm a little surprised if this is truly the most efficient way and also if no libraries exist yet to solve this. I'm a little hesitant to jump into it seeing as SQL syntax can itself be a little challenging and I might accidentally mash it up.

Some clarifications in advance:

  • psql cannot be used as it's not installed on the target machine
  • I've chosen to develop and source control sql statements in sql native form, because it's a lot easier for a DBA to use and manipulate it

Solution

  • I've written the following function which works for my case. It would have been much more simpler if it weren't for:

    • Using batch to manage concurrency
    • Having the tricky PostgreSQL COPY case to consider

    Code snippet:

    function processSQLFile(fileName) {
    
      // Extract SQL queries from files. Assumes no ';' in the fileNames
      var queries = fs.readFileSync(fileName).toString()
        .replace(/(\r\n|\n|\r)/gm," ") // remove newlines
        .replace(/\s+/g, ' ') // excess white space
        .split(";") // split into all statements
        .map(Function.prototype.call, String.prototype.trim)
        .filter(function(el) {return el.length != 0}); // remove any empty ones
    
      // Execute each SQL query sequentially
      queries.forEach(function(query) {
        batch.push(function(done) {
          if (query.indexOf("COPY") === 0) { // COPY - needs special treatment
            var regexp = /COPY\ (.*)\ FROM\ (.*)\ DELIMITERS/gmi;
            var matches = regexp.exec(query);
            var table = matches[1];
            var fileName = matches[2];
            var copyString = "COPY " + table + " FROM STDIN DELIMITERS ',' CSV HEADER";
            var stream = client.copyFrom(copyString);
            stream.on('close', function () {
              done();
            });
            var csvFile = __dirname + '/' + fileName;
            var str = fs.readFileSync(csvFile);
            stream.write(str);
            stream.end();
          } else { // Other queries don't need special treatment
            client.query(query, function(result) {
              done();
            });
          }
        });
      });
    }
    

    Beware that this would fail if you used semicolons anywhere except to terminate SQL statements.