Search code examples
javascriptvariableshoisting

How does hoisting work if JavaScript is an interpreted language?


My understanding of an interpreter is that it executes program line by line and we can see the instant results, unlike compiled languages which convert code, then executes it.

My question is, in Javascript, how does interpreter come to know that a variable is declared somewhere in the program and logs it as undefined?

Consider the program below:

function do_something() {
  console.log(bar); // undefined (but in my understanding about an interpreter, it should be throwing error like variable not declared)
  var bar = 111;
  console.log(bar); // 111
}

Is implicitly understood as:

function do_something() {
  var bar;
  console.log(bar); // undefined
  bar = 111;
  console.log(bar); // 111
}

How does this work?


Solution

  • This concept of 'var hoisting' is quite a confusing one if you think of it on the surface. You have to delve into how the language itself works. JavaScript, which is an implementation of ECMAScript, is an interpreted language, meaning all the code you write is fed into another program that in turn, interprets the code, calling certain functions based on parts of your source code.

    For example, if you write:

    function foo() {}
    

    The interpreter, once it meets your function declaration, will call a function of its own called FunctionDeclarationInstantiation that creates the function. Instead of compiling JavaScript into native machine code, the interpreter executes C, C++, and machine code of its own 'on demand' as each part of your JavaScript code is read. It does not necessarily mean line-by-line, all interpreted means it that no compilation into machine code happens. A separate program that executes machine code reads your code and executes that machine code on the fly.

    How this has to with var declaration hoisting or any declaration for that matter, is that the interpreter first reads through all your code once without executing any actual code. It analyzes the code and separates it into chunks, called lexical environment. Per the ECMAScript 2015 Language Specification:

    8.1 Lexical Environments

    A Lexical Environment is a specification type used to define the association of Identifiers to specific variables and functions based upon the lexical nesting structure of ECMAScript code. A Lexical Environment consists of an Environment Record and a possibly null reference to an outer Lexical Environment. Usually a Lexical Environment is associated with some specific syntactic structure of ECMAScript code such as a FunctionDeclaration, a BlockStatement, or a Catch clause of a TryStatement and a new Lexical Environment is created each time such code is evaluated.

    An Environment Record records the identifier bindings that are created within the scope of its associated Lexical Environment. It is referred to as the Lexical Environment’s EnvironmentRecord

    Before any code is executed, the interpreter goes through your code and for every lexical structure, such as a function declaration, a new block, etc, a new lexical environment is created. And in those lexical environments, an environment record records all the variables declared in that environment, their value, and other information about that environment. That's what allows for JavaScript to manage variable scope, variable lookup chains, this value, etc.

    Each lexical environment is associated with a code realm:

    8.2 Code Realms

    Before it is evaluated, all ECMAScript code must be associated with a Realm. Conceptually, a realm consists of a set of intrinsic objects, an ECMAScript global environment, all of the ECMAScript code that is loaded within the scope of that global environment, and other associated state and resources.

    Every section of JavaScript/ECMAScript code you write is associated with a realm before any of the code is actually executed. Each realm consists of the intrinsic values used by the specific section of code associated with the realm, the this object for the realm, a lexical environment for the realm, among other things.

    This means each lexical part of your code is analyzed before executing. Then a realm is created that houses all the information on that set of code. The source, what variables are needed to execute it, which variables have been declared, what this is, etc. In the case of var declarations, a realm is created, when you define a function like you did here:

    function do_something() {
      console.log(bar); // undefined
      var bar = 111;
      console.log(bar); // 111
    }
    

    Here, a FunctionDeclaration creates a new lexical environment, associated with a new realm. When a lexical environment is created, the interpreter analyzes the code and finds all declarations. Those declarations are then first processed at the very beginning of that lexical environment, thus the 'top' of the function:

    13.3.2 Variable Statement

    A var statement declares variables that are scoped to the running execution context’s VariableEnvironment. Var variables are created when their containing Lexical Environment is instantiated and are initialized to undefined when created.

    Thus, whenever a lexical environment is instantiated (created), all the var declarations are created, initialized to undefined. That means they are processed before any code is executed, at the 'top' of the lexical environment:

    var bar; //Processed and declared first
    console.log(bar);
    bar = 111;
    console.log(bar);
    

    Then, after all your JavaScript code is analyzed, it is finally executed. Because the declaration was processed first, it is declared (and initialized to undefined) giving you undefined.

    Hoist is kind of a misnomer really. Hoist implies that the declarations are moved directly to the top of the current lexical environment, but instead the code is analyzed before execution; nothing is moved.


    Note: let and const act in the same way and are also hoisted but this won't work:

    function do_something() {
      console.log(bar); //ReferenceError
      let bar = 111;
      console.log(bar);
    }
    

    This will give you a ReferenceError for trying to access an uninitialized variable. Even though let and const declarations are hoisted, the specification explicitly states that you cannot access them before they are initialized, unlike var:

    13.3.1 Let and Const Declarations

    let and const declarations define variables that are scoped to the running execution context’s LexicalEnvironment. The variables are created when their containing Lexical Environment is instantiated but may not be accessed in any way until the variable’s LexicalBinding is evaluated.

    Thus, you can't access the variable until it is formally initialized, whether to undefined or any other value. That means you can't seemingly 'access it before it's declared' like you can with var.