Search code examples
javascriptloggingcompilationv8

JS string concatenation optimization


Recently I've been implementing a simple logger for JS library, the example looks like this:

class Logger {
  static level: 'error' | 'info' = 'error';

  static error(message: string) {
    if (this.level === 'error') {
      console.error(message);
    }
  }

  static info(message: string) {
    if (this.level === 'info') {
      console.info(message);
    }
  }
}

function app() {
  Logger.level = 'info';

  Logger.error('Error happened at ' + new Date());
  Logger.info('App started at ' + new Date());
}

It works, however I noticed some of my collegues implementing the same functionality in Java and C# are using lambdas (or callbacks) to pass the message to logger, for them the logger looks like this:

class Logger {
  static level: 'error' | 'info' = 'error';

  static error(message: () => string) {
    if (this.level === 'error') {
      console.error(message());
    }
  }

  static info(message: () => string) {
    if (this.level === 'info') {
      console.info(message());
    }
  }
}

function app() {
  Logger.level = 'info';

  Logger.error(() => 'Error happened at ' + new Date());
  Logger.info(() => 'App started at ' + new Date());
}

The argument for the second implementation is that only one log level is active at the time, so if the error is turned on then info method will never call the message() thus the string won't be concatenated allowing for more optimized code.

It makes sense, but after many attempts at testing it I couldn't find any difference at execution times and memory usage. I tried really complex concatenations with several dozens of + both for callbacks and plain concat but the numbers seems to be the same.

I have a feeling that modern browsers compilers are good at optimizing conditional concatenation, but I couldn't find any good evidence after looking at AST Explorer and Compiler Explorer.

Question : Does anyone know if such an optimization makes sense for modern JS and if yes is there any valid references to how to check in practice?


Solution

  • I benchmarked your 2 versions, and replaced the body of the app function by

      for (let i = 0; i < 200000; i++) {
        Logger.error('Error happened at ' + new Date());
        Logger.info('App started at ' + new Date());
      }
    

    (or the same thing with lambdas) And I removed the console.log (while still making sure to call message() in the version with lambdas).

    The version with the lambdas takes about 0.5sec and without the lambdas it takes about 1sec.

    So, it looks like lambdas do help indeed. However, the cost of doing console.log is larger than the cost of doing a string concatenation and a new Date(). Indeed, if I re-add the console.log, both version have the same performance. If your prints are typically not executed (like for instance, if level is expected to be info in production, but your code contains a lot of Logger.error), then using Lambdas makes things faster indeed.
    I compared a version of your code where the main loop only contains Logger.info while Logger.level is error (==> nothing gets logged) and the version with the lambdas is about 15 times faster than the version without the lambdas. The main cost comes from doing new Date(), as it calls a version and allocates memory.

    The best thing to do though, would probably be to put the new Date() inside the error and info helpers. String concatenation is extremely cheap is V8 because it uses "cons strings": when you do "aaaaaaaaa" + "bbbbbbbbb", V8 doesn't really create a string "aaaaaaaaabbbbbbbbb" but instead create a ConsString object, which contains 2 pointers: one to "aaaaaaaaa" and one to "bbbbbbbbb". Only when you try to read the string (which would happen when printing it), V8 collapses it into a regular sequential string.


    Still, because you tagged the question with the "compilation" tag, here are some explanations of how V8 should, in theory, be able to make both versions (with and without lambda) have a similar speed, thanks to a mix of inlining, constant folding, dead code elimination, and speculative optimizations and deoptimizations. Here is how this could work.

    Inlining. Small functions are almost always inlined by Turbofan (V8's top-tier compiler). Larger functions are also inlined if we have enough inlining budget. So, in your app function,

    Logger.error('Error happened at ' + new Date());
    Logger.info('App started at ' + new Date());
    

    will be inlined. At this point, app contains thus more or less:

    if (this.level === 'error') {
        console.error('Error happened at ' + new Date());
    }
    if (this.level === 'info') {
        console.info('App started at ' + new Date());
    }
    

    Then, two different things could happen:

    Constant folding. Turbofan might be able to detect that this.level is constant (it was assigned to a given value and has never been changed so far), and then replace this.level by its value, which would turn the checks into <some value> === 'error' and <some value> === 'info'. Then, Turbofan would evaluate these checks at compile time, replacing them with true or false. Turbofan will then realize that the body of some of these IFs is unreachable, and remove them (that's called dead code elimination). In practice though, 'Error happened at ' + new Date() used to be an argument to the function, and is thus computed before the body of the IF; the code really looks more like

    let message = 'Error happened at ' + new Date();
    if (this.level === 'error') {
        console.log(message);
    }
    

    As such, dead code elimination would mainly remove console.log(message), and it might then remove the string concatenation (because its result is unused), but it won't remove new Date() because it's not smart enough to realize that calling new Date() doesn't have side effects.

    Note that constant folding in that case would be speculative: if this.level were to change later, V8 would throw away the code generated by Turbofan (since it would now be incorrect), and recompile.

    If Turbofan cannot speculate on the value of this.value (because it's not constant for instance, or because you've called console.error on different Logger objects), V8 might still be able to optimize your code.

    In particular, Turbofan doesn't generate assembly code for JS code for which it doesn't have feedback. Instead, it generates deoptimizations, which are special instructions that switch back to the interpreter. So (still after inlining), if every time you've executed the function, this.level was info, the Turbofan will generate something like

    if (this.level === 'error') {
        Deoptimize();
    }
    if (this.level === 'info') {
        console.info('App started at ' + new Date());
    }
    

    As a result, 'Error happened at ' + new Date() is not in the graph at all anymore and is thus "free". If you ever change this.level and hit this case, the Deoptimize() instruction will take care of going back to the bytecode interpreter in order to execute the console.error('Error happened at ' + new Date());. In practice, this is once again not so easy, because 'Error happened at ' + new Date() is initially computed before the if and not inside. And so the graph looks more like

    let message = 'Error happened at ' + new Date();
    if (this.level === 'error') {
        Deoptimize(message)
    }
    

    (the Deoptimization always takes some inputs describe the values currently alive in the function).

    At this point, code motion could still help you and move let message = 'Error happened at ' + new Date(); to inside the if so that it's not computed when you don't need it. In practice though, Turbofan's code motion is not extremely powerful, and this won't happen, because new Date() looks a little bit too opaque for Turbofan.


    A final note: concatenation of constant strings is usually performed at compile time by Turbofan. This doesn't apply to + Date() of course, but if you try to benchmark "hello " + "world", then this would be like benchmarking "hello world".