Search code examples
javascriptnode.jsreadfilefscommonjs

Node.js and the Module Scope: Most Efficient Way to Read Files into Memory


I'm trying to better understand the node.js module scope and require in the context of variable instantiation. More specifically reading files into memory.

I have an http server with a module that reads static sql files stored in the codebase and executes the queries contained therein. For example:

'use strict';

const fs = require('fs')
const executeSql = require('./utils/execute-sql');

module.exports.getDataById = (id) => {
  const sql = fs.readFileSync(
    `./data-access/sql/getDataById.sql`, 'utf8'
  );

  return executeSql(sql, id);
}

module.exports.getDataByName = (name) => {
  const sql = fs.readFileSync(
    `./data-access/sql/getDataByName.sql`, 'utf8'
  );

  return executeSql(sql, name);
}

My understanding is that each time these functions (getDataById and getDataByName) are called, the file is read synchronously in a blocking fashion and blocks the execution thread. I know I can read the files asynchronously to avoid this, but what I'm really curious about is whether pulling the sql variables out of the function and into the module scope means the readFile operations only happens once (when the node process is instantiated) and would ultimately be more efficient. For example:

'use strict';

const fs = require('fs')
const executeSql = require('./utils/execute-sql');
const sql1 = fs.readFileSync(
  `./data-access/sql/getDataById.sql`, 'utf8'
);
const sql2 = fs.readFileSync(
  `./data-access/sql/getDataByName.sql`, 'utf8'
);

module.exports.getDataById = (id) => {
  return executeSql(sql1, id);
}

module.exports.getDataByName = (name) => {
  return executeSql(sql2, name);
}

I know that require loads modules synchronously on the initialization of the node process, and further caches those modules should they be required elsewhere, but what I'm trying to understand is if standard variable declarations NOT using require result in a similarly instantiated memory reference that persists for the lifetime of the node process, not needing to be re-instantiated each time the module is required.

I appreciate any insight you can provide.


Solution

  • You are right. Each time a module requires another module, only the first time the code is executed and the rest of the times it just returns the cached exports, so in your example the fs.readFileSync will be ran one time (the first time someone requires it), node.js will cache the exports object and in the next requires that exports object will be returned, without running the code again.

    You can test that with something like this:

    var mod = require("./myModule");
    console.log(mod.nonExistantProperty); // This will log undefined
    mod.nonExistantProperty = "yay";
    
    var requireagain = require("./myModule");
    console.log(requireagain.nonExistantProperty); // This will log yay
    

    In the second require, instead of executing the module code again, it will just return the object that was cached, so you can see the modifications you made before requiring it the second time.

    With this info, in your first example you are returning functions in the export that will execute their code each time you call them (obviously), so if you have a readfile method inside the function, it will be ran each time.

    Your second approach is what is usually made to improve performance as the code is ran only one time (on first require), and each time the exported functions are executed, they will access the variable content which already has the file contents cached. Kudos for you for reaching that conclusion :-) keep it up.